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Preface 



This edition of Elementary Linear Algebra gives an introductory treatment of linear algebra that is suitable for 
a first undergraduate course. Its aim is to present the fundamentals of linear algebra in the clearest possible 
way — sound pedagogy is the main consideration. Although calculus is not a prerequisite, there is some 
optional material that is clearly marked for students with a calculus background. If desired, that material can 
be omitted without loss of continuity. 

Technology is not required to use this text, but for instructors who would like to use MAT LAB, Mathematica, 
Maple, or calculators with linear algebra capabilities, we have posted some supporting material that can be 
accessed at either of the following Web sites: 

www.howardanton.com 

www.wiley.com/college/anton 

Summary of Changes in this Edition 

This edition is a major revision of its predecessor. In addition to including some new material, some of the old 
material has been streamlined to ensure that the major topics can all be covered in a standard course. These 
are the most significant changes: 

• Vectors in 2-space, 3-space, and w-space Chapters 3 and 4 of the previous edition have been combined 
into a single chapter. This has enabled us to eliminate some duplicate exposition and to juxtapose concepts 
in w-space with those in 2-space and 3-space, thereby conveying more clearly how w-space ideas generalize 
those already familiar to the student. 

• New Pedagogical Elements Each section now ends with a Concept Review and a Skills mastery that 
provide the student a convenient reference to the main ideas in that section. 

• New Exercises Many new exercises have been added, including a set of True/False exercises at the end of 
most sections. 

• Earlier Coverage of Eigenvalues and Eigenvectors The chapter on eigenvalues and eigenvectors, which 
was Chapter 7 in the previous edition, is Chapter 5 in this edition. 

• Complex Vector Spaces The chapter entitled Complex Vector Spaces in the previous edition has been 
completely revised. The most important ideas are now covered in Section 5.3 and Section 7.5 in the context 
of matrix diagonalization. A brief review of complex numbers is included in the Appendix. 

• Quadratic Forms This material has been extensively rewritten to focus more precisely on the most 
important ideas. 

• New Chapter on Numerical Methods In the previous edition an assortment of topics appeared in the last 
chapter. That chapter has been replaced by a new chapter that focuses exclusively on numerical methods of 
linear algebra. We achieved this by moving those topics not concerned with numerical methods elsewhere 
in the text. 

*^ Singular- Value Decomposition In recognition of its growing importance, a new section on Singular-Value 
Decomposition has been added to the chapter on numerical methods. 



^ Internet Search and the Power Method A new section on the Power Method and its application to 
Internet search engines has been added to the chapter on numerical methods. 

• Applications There is an expanded version of this text by Howard Anton and Chris Rorres entitled 

Elementary Linear Algebra: Applications Version, 10^^ (ISBN 9780470432051), whose purpose is to 
supplement this version with an extensive body of applications. However, to accommodate instructors who 
asked us to include some applications in this version of the text, we have done so. These are generally less 
detailed than those appearing in the Anton/Rorres text and can be omitted without loss of continuity. 



Hallmark Features 

♦ Relationships Among Concepts One of our main pedagogical goals is to convey to the student that linear 
algebra is a cohesive subject and not simply a collection of isolated definitions and techniques. One way in 
which we do this is by using a crescendo of Equivalent Statements theorems that continually revisit 
relationships among systems of equations, matrices, determinants, vectors, linear transformations, and 
eigenvalues. To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 4.8.10, 
4.10.4 and then Theorem 5.1.6, for example. 

* Smooth Transition to Abstraction Because the transition from to general vector spaces is difficult for 
many students, considerable effort is devoted to explaining the purpose of abstraction and helping the 
student to "visualize" abstract ideas by drawing analogies to familiar geometric ideas. 

« Mathematical Precision When reasonable, we try to be mathematically precise. In keeping with the level 
of student audience, proofs are presented in a patient style that is tailored for beginners. There is a brief 
section in the Appendix on how to read proof statements, and there are various exercises in which students 
are guided through the steps of a proof and asked for justification. 

* Suitability for a Diverse Audience This text is designed to serve the needs of students in engineering, 
computer science, biology, physics, business, and economics as well as those majoring in mathematics. 

• Historical Notes To give the students a sense of mathematical history and to convey that real people 
created the mathematical theorems and equations they are studying, we have included numerous Historical 
Notes that put the topic being studied in historical perspective. 



About the Exercises 

« Graded Exercise Sets Each exercise set begins with routine drill problems and progresses to problems 
with more substance. 

< True/False Exercises Most exercise sets end with a set of True/False exercises that are designed to check 
conceptual understanding and logical reasoning. To avoid pure guessing, the students are required to justify 
their responses in some way. 

^ Supplementary Exercise Sets Most chapters end with a set of supplementary exercises that tend to be 
more challenging and force the student to draw on ideas from the entire chapter rather than a specific 
section. 



Supplementary Materials for Students 

* Student Solutions Manual This supplement provides detailed solutions to most theoretical exercises and 
to at least one nonroutine exercise of every type (ISBN 9780470458228). 

* Technology Exercises and Data Files The technology exercises that appeared in the previous edition have 
been moved to the Web site that accompanies this text. Those exercises are designed to be solved using 
MAT LAB, Mathematica, or Maple and are accompanied by data files in all three formats. The exercises and 
data can be downloaded from either of the following Web sites. 

www.ho wardanton. com 

www.wiley.com/college/anton 



Supplementary Materials for Instructors 

• Instructor's Solutions Manual This supplement provides worked-out solutions to most exercises in the 
text (ISBN 9780470458235). 

• WileyPLUS'^^ This is Wiley's proprietary online teaching and learning environment that integrates a 
digital version of this textbook with instructor and student resources to fit a variety of teaching and learning 
styles. WileyPLUS will help your students master concepts in a rich and structured environment that is 
available to them 24/7. It will also help you to personalize and manage your course more effectively with 
student assessments, assignments, grade tracking, and other useful tools. 

• Your students will receive timely access to resources that address their individual needs and will 
receive immediate feedback and remediation resources when needed. 

• There are also self-assessment tools that are linked to the relevant portions of the text that will enable 
your students to take control of their own learning and practice. 

• WileyPLUS will help you to identify those students who are falling behind and to intervene in a 
timely manner without waiting for scheduled office hours. 

More information about WileyPLUS can be obtained from your Wiley representative. 

A Guide for the Instructor 

Although linear algebra courses vary widely in content and philosophy, most courses fall into two categories 
— those with about 35^0 lectures and those with about 25-30 lectures. Accordingly, we have created long 
and short templates as possible starting points for constructing a course outline. Of course, these are just 
guides, and you will certainly want to customize them to fit your local interests and requirements. Neither of 
these sample templates includes applications. Those can be added, if desired, as time permits. 





Long Template 


Short Template 


Chapter 1: Systems of Linear Equations and Matrices 


7 lectures 


6 lectures 


Chapter 2: Determinants 

1 


3 lectures 


2 lectures 







k3iiUI l ±l:;IIi|Ji<tll:; 


Chapter 3 : Euclidean Vector Spaces 


4 lectures 


3 lectures 


Chapter 4: General Vector Spaces 


10 lectures 


10 lectures 


Chapter 5: Eigenvalues and Eigenvectors 


3 lectures 


3 lectures 


Chapter 6: Inner Product Spaces 


3 lectures 


1 lecture 


Chapter 7: Diagonalization and Quadratic Forms 


4 lectures 


3 lectures 


Chapter 8: Linear Transformations 


3 lectures 


2 lectures 


Total: 


37 lectures 


30 lectures 
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CHAPTER I 

1 Systems of Linear 
Equations and Matrices 



CHAPTER CONTENTS 

1.1. Introduction to Systems of Linear Equations 

1.2. Gaussian Elimination 

1.3. Matrices and Matrix Operations 

1.4. Inverses; Algebraic Properties of Matrices 

1.5. Elementary Matrices and a Method for Finding 

1.6. More on Linear Systems and Invertible Matrices 

1.7. Diagonal, Triangular, and Symmetric Matrices 

1.8. Applications of Linear Systems 

• Network Analysis (Traffic Flow) 

• Electrical Circuits 

• Balancing Chemical Equations 

• Polynomial Interpolation 

1.9. Leontief Input-Output Models 



INTRODUCTION 



Information in science, business, and mathematics is often organized into rows and 
columns to form rectangular arrays called "matrices" (plural of "matrix"). Matrices often 
appear as tables of numerical data that arise from physical observations, but they occur in 
various mathematical contexts as well. For example, we will see in this chapter that all of 
the information required to solve a system of equations such as 

5x+y = 3 

2x-y = A 

is embodied in the matrix 



5 

2 - 



1 3 
1 4 



and that the solution of the system can be obtained by performing appropriate operations 
on this matrix. This is particularly important in developing computer programs for solving 
systems of equations because computers are well suited for manipulating arrays of 
numerical information. However, matrices are not simply a notational tool for solving 
systems of equations; they can be viewed as mathematical objects in their own right, and 
there is a rich and important theory associated with them that has a multitude of practical 
applications. It is the study of matrices and related topics that forms the mathematical field 
that we call "linear algebra." In this chapter we will begin our study of matrices. 
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1.1 Introduction to Systems of Linear Equations 



Systems of linear equations and their solutions constitute one of the major topics that we will study in this 
course. In this first section we will introduce some basic terminology and discuss a method for solving such 
systems. 

Linear Equations 

Recall that in two dimensions a line in a rectangular xy-coordinate system can be represented by an equation of 
the form 

ax-\-by = c (a, not both 0) 

and in three dimensions a plane in a rectangular xyz-coordinate system can be represented by an equation of the 
form 

ax + by + cz = d (a, b, not all 0) 

These are examples of "linear equations," the first being a linear equation in the variables x andy and the second 
a linear equation in the variables x, and z. More generally, we define a linear equation in the n variables 
1 , 7:2, - - :^ ^] to be one that can be expressed in the form 

aixi +a2X2+.- + ay^Xy^ = b (1) 

where a\, a2, - b are constants, and the a's are not all zero. In the special cases where « = 2 or « = 3, 

we will often use variables without subscripts and write linear equations as 

a\x^a2y = b (rati, r3(2 not both 0) (2) 

aix + a2y + a;^ = b (ai, a2, iat3notall0) (3) 
In the special case where ^ = 0? Equation 1 has the form 

a\x\ +<a(2X2+... + i2J(„:r„ = 0 (4) 
which is called a homogeneous linear equation in the variables x\, 7:2,--., 

EXAMPLE 1 Linear Equations ^ 

Observe that a linear equation does not involve any products or roots of variables. All variables 
occur only to the first power and do not appear, for example, as arguments of trigonometric, 
logarithmic, or exponential functions. The following are linear equations: 

x + 37 = 7 7: 1 — 2x2 — + ^4 = 0 

^x~y + 3z= - 1 X{ \ X2 + .-. + Xn = \ 

The following are not linear equations: 



x + 3y^=^4 3x + 2y^xy = 5 
^x+y = 0 ^xi + 2x2 + X2 = l 



A finite set of linear equations is called a system of linear equations or, more briefly, a linear system. The 
variables are called unknowns. For example, system 5 that follows has unknowns x andy, and system 6 has 
unknowns Xi,X2, and ^3. 

5x+y = 3 4;vi -;t2 + 3:t3= - 1 (5) 
2x^y = 4 3xi+:r2 + 5^3= -4 (6) 



The double subscripting on the coefficients ^ij 
of the unknowns gives their location in the 
system — ^the first subscript indicates the equation 
in which the coefficient occurs, and the second 
indicates which unknown it multplies. Thus, £2 12 
is in the first equation and multiplies ^2- 



A general linear system of m equations in the n unknowns x\, X2, can be written as 

-^ll^l +ai2X2 + -- + aiyiXy, = bi 

^2\x\ + a22X2 + + ^2w^m = h .j. 

■ ■ ■ ■ V / 

: : : : 

^mix 1 + ^na^l + — + ^mnXn = 

A solution of a linear system in n unknowns xi^ ^2, is a sequence of n numbers s^, S2» which 

the substitution 

7:1 =51, X2=S2»"v ^m = 5m 

makes each equation a true statement. For example, the system in 5 has the solution 

x = \,y=-2 

and the system in 6 has the solution 

XI = 1, X2 = 2, ;C3 = - 1 

These solutions can be written more succinctly as 

(1, ^2) aiid (1, 2, - 1) 

in which the names of the variables are omitted. This notation allows us to interpret these solutions geometrically 
as points in two-dimensional and three-dimensional space. More generally, a solution 

Xl=Sl, X2=S2 Xyi=Syi 

of a linear system in n unknowns can be written as 

(su S2 ^m) 

which is called an ordered n-tuple. With this notation it is understood that all variables appear in the same order 



in each equation. If ^ = 2, then the /z-tuple is called an ordered pair, and if « = 3, then it is called an ordered 
triple. 



Linear Systems with Two and Three Unknowns 

Linear systems in two unknowns arise in connection with intersections of lines. For example, consider the linear 
system 

a\x^b\y = c\ 

in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this system corresponds to a 
point of intersection of the lines, so there are three possibilities (Figure 1.1.1): 

1. The lines may be parallel and distinct, in which case there is no intersection and consequently no solution. 

2. The lines may intersect at only one point, in which case the system has exactly one solution. 

3. The lines may coincide, in which case there are infinitely many points of intersection (the points on the 
common line) and consequently infinitely many solutions. 




No solution 



One solution 



X 



Infinitely many 
solutions 
(coincident lines) 



Figure 1.1.1 



In general, we say that a linear system is consistent if it has at least one solution and inconsistent if it has no 
solutions. Thus, a consistent linear system of two equations in two unknowns has either one solution or infinitely 
many solutions — there are no other possibilities. The same is true for a linear system of three equations in three 
unknowns 

a\x+b\y + ciz = d\ 
a2X + b2y +C2Z = d2 

a^x + + C3Z = d2 

in which the graphs of the equations are planes. The solutions of the system, if any, correspond to points where 
all three planes intersect, so again we see that there are only three possibilities — no solutions, one solution, or 
infinitely many solutions (Figure 1.1.2). 



No solutions 
(three parallel planes; 
no common intersection) 



No solutions 
(two parallel planes; 
no common intersection) 



No solutions 
(no common intersection) 



No solutions 
(two coincident planes 

parallel to the third; 
no common intersection) 




One solution 
(intersection is a point) 




Infinitely many solutions 
(intersection is a line) 




Infinitely many solutions 
(planes are all coincident; 
intersection is a plane) 




Infinitely many solutions 
(two coincident planes; 
intersection is a line) 



Figure 1.1.2 

We will prove later that our observations about the number of solutions of linear systems of two equations in two 
unknowns and linear systems of three equations in three unknowns actually hold for all linear systems. That is: 



Every system of linear equations has zero, one, or infinitely many solutions. There are no other 
possibilities. 



EXAMPLE 2 A Linear System with One Solution < 



Solve the linear system 



x-y = \ 
2x+y = 6 



Solution We can eliminate x from the second equation by adding -2 times the first equation to 
the second. This yields the simplified system 

3y = 4 

From the second equation we obtain y = and on substituting this value in the first equation we 
7 

obtain x = \ +y = -:r. Thus, the system has the unique solution 



r ^ 3 

Geometrically, this means that the lines represented by the equations in the system intersect at the 
single point ^-j, ^ j. We leave it for you to check this by graphing the lines. 



EXAMPLE 3 A Linear System with No Solutions < 

Solve the linear system 

x+y = A 

Solution We can eliminate x from the second equation by adding -3 times the first equation to 
the second equation. This yields the simplified system 

x+y = 4 

0= -6 

The second equation is contradictory, so the given system has no solution. Geometrically, this 
means that the lines corresponding to the equations in the original system are parallel and distinct. 
We leave it for you to check this by graphing the lines or by showing that they have the same slope 
but different j^-intercepts. 

EXAMPLE 4 A Linear System with Infinitely Many Solutions M 

Solve the linear system 

4x^2y=l 

\6x^Sy=4 

In Example 4 we could have also obtained 
parametric equations for the solutions by 
solving 8 for in terms of x, and letting 
;^ = ^ be the parameter. The resulting 
parametric equations would look different 
but would define the same solution set. 



Solution We can eliminate x from the second equation by adding -4 times the first equation to 
the second. This yields the simplified system 

4x-2y = \ 

0 = 0 

The second equation does not impose any restrictions on x andy and hence can be omitted. Thus, 
the solutions of the system are those values of x and that satisfy the single equation 



4x-2y = \ 



(8) 



Geometrically, this means the lines corresponding to the two equations in the original system 
coincide. One way to describe the solution set is to solve this equation for x in terms ofy to obtain 
= ^ + and then assign an arbitrary value t (called a parameter) to y. This allows us to 

express the solution by the pair of equations (called parametric equations) 

. = 1 + 1,. y=t 

We can obtain specific numerical solutions from these equations by substituting numerical values 
for the parameter. For example, i — Q yields the solution 0 j, t = \ yields the solution 1 j, 

and ^ = _ 1 yields the solution f— ~ '')* confirm that these are solutions by 

substituting the coordinates into the given equations. 



EXAMPLE 5 ALinearSystem with Infinitely Many Solutions M 

Solve the linear system 

X —7 + 2z = 5 

2x-2y + 4z = 10 

3x-3y \ 6z = 15 

Solution This system can be solved by inspection, since the second and third equations are 
multiples of the first. Geometrically, this means that the three planes coincide and that those values 
of X, and z that satisfy the equation 

x-7 + 2z = 5 (9) 

automatically satisfy all three equations. Thus, it suffices to find the solutions of 9. We can do this 
by first solving 9 for x in terms of y and z, then assigning arbitrary values r and s (parameters) to 
these two variables, and then expressing the solution by the three parametric equations 

X = 5 ~\- r^2s, y = r, z = s 

Specific solutions can be obtained by choosing numerical values for the parameters r and s. For 
example, taking ^ = 1 and s = 0 yields the solution (6, 1,0). 



Augmented Matrices and Elementary Row Operations 

As the number of equations and unknowns in a linear system increases, so does the complexity of the algebra 
involved in finding solutions. The required computations can be made more manageable by simplifying notation 
and standardizing procedures. For example, by mentally keeping track of the location of the +'s, the x's, and the 
='s in the linear system 



anxi 



^3^22^2 



+ 
+ 



+ = ^1 

+ ^2n^n = h 



we can abbreviate the system by writing only the rectangular array of numbers 

^11 ^12 • • • ^iM ^1 



<^2\ ^22 



^m2 



^2m h 



As noted in the introduction to this chapter, the 
term "matrix" is used in mathematics to denote a 
rectangular array of numbers. In a later section 
we will study matrices in detail, but for now we 
will only be concerned with augmented matrices 
for linear systems. 

This is called the augmented matrix for the system. For example, the augmented matrix for the system of 
equations 

x\+X2 + 2x2 = 9 
2:^1 +47:2 — 3^3 = 1 is 
3:^1 + 67:2 " 5x2 — ^ 



1 1 

2 4 

3 6 



2 9 
-3 1 
-5 0 



The basic method for solving a linear system is to perform appropriate algebraic operations on the system that do 
not alter the solution set and that produce a succession of increasingly simpler systems, until a point is reached 
where it can be ascertained whether the system is consistent, and if so, what its solutions are. Typically, the 
algebraic operations are as follows: 

1. Multiply an equation through by a nonzero constant. 

2. Interchange two equations. 

3. Add a constant times one equation to another. 

Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, 
these three operations correspond to the following operations on the rows of the augmented matrix: 

1. Multiply a row through by a nonzero constant. 

2. Interchange two rows. 

3. Add a constant times one row to another. 

These are called elementary row operations on a matrix. 

In the following example we will illustrate how to use elementary row operations and an augmented matrix to 
solve a linear system in three unknowns. Since a systematic procedure for solving linear systems will be 
developed in the next section, do not worry about how the steps in the example were chosen. Your objective here 
should be simply to understand the computations. 



EXAMPLE 6 Using Elementary Row Operations M 



In the left column we solve a system of linear equations by operating on the equations in the 
system, and in the right column we solve the same system by operating on the rows of the 
augmented matrix. 



x+y + 2z = 9 
2x+4y-3z = 1 
3x + ey-5z = 0 

Add -2 times the first equation to the second 
to obtain 

x+y + 2z = 9 

2y-lz = -17 

37: + 67 - 5z = 0 

Add -3 times the first equation to the third to 
obtain 

x+y + 2z = 9 
27 -7z = -17 
37-llz = -27 

Multiply the second equation by to obtain 

X +y + 2z = 9 



2 

3y-Uz = -27 

Add -3 times the second equation to the third 
to obtain 

x-i-y + 2z = 9 

7 17 
y-jz = -- 

2^ 2 

Multiply the third equation by -2 to obtain 
X +y + 2z = 9 

^2 2 
z = 3 

Add -1 times the second equation to the first 
to obtain 



11 2 9 

2 4-31 

3 6-50 



Add -2 times the first row to the second 
to obtain 



1 1 

0 2 

3 6-5 



2 9 
-7 -17 

0 



Add -3 times the first row to the third to 
obtain 

2 9" 



1 1 

0 2 
0 3 



-7 -17 
-11 -27 



Multiply the second row by to obtain 



1 1 



0 , -I -f 

0 3 -11 -27 



Add -3 times the second row to the third 
to obtain 



1 1 
0 1 

0 0 



9 

XL 
2 

3 
'2 



Multiply the third row by -2 to obtain 
112 9 



0 0 



1 



Add -1 times the second row to the first 
to obtain 



^2^ 2 



y-T = "2 



z = 



I 

3 



1 


0 


11 

1 1 


3'S 


2 


2 


0 


1 


7 


17 




2 


2 


0 


0 


1 


3 



11 



Add —-7;^ times the third equation to the first 

7 
2 

obtain 



Add — 4r- times the third row to the first 
2 

7 



and times the third equation to the second to and times the third row to the second 



= 1 
= 2 

= 3 



to obtain 

10 0 1 
0 10 2 
0 0 13 



The solution x = \, y = 2, z=3is now evident. 



Maxime Bocher (1867-1918) 



Historical Note The first known use of augmented matrices appeared between 200 B.C. 
and 100 B.C. in a Chinese manuscript entitled TVme Chapters of Mathematical Art. The 
coefficients were arranged in columns rather than in rows, as today, but remarkably the 
system was solved by performing a succession of operations on the columns. The actual 
use of the term augmented matrix appears to have been introduced by the American 
mathematician Maxime Bocher in his hook Introduction to Higher Algebra, published in 
1907. In addition to being an outstanding research mathematician and an expert in Latin, 
chemistry, philosophy, zoology, geography, meteorology, art, and music, Bocher was an 
outstanding expositor of mathematics whose elementary textbooks were greatly 
appreciated by students and are still in demand today. 
[Image: Courtesy of the American Mathematical Society] 



Concept Review 

• Linear equation 

• Homogeneous linear equation 
« System of linear equations 

• Solution of a linear system 

• Ordered ^-tuple 

• Consistent linear system 

• Inconsistent linear system 

• Parameter 

• Parametric equations 

• Augmented matrix 

• Elemenetary row operations 



• Determine whether a given equation is linear. 

• Determine whether a given ^-tuple is a solution of a linear system. 

• Find the augmented matrix of a linear system. 

• Find the linear system corresponding to a given augmented matrix. 

• Perform elementary row operations on a linear system and on its corresponding augmented matrix. 

• Determine whether a linear system is consistent or inconsistent. 

• Find the set of solutions to a consistent linear system. 



1. In each part, determine whether the equation is linear mXi,X2, and ^2. 



Skills 



Exercise Set 1,1 




(e) 



(b) 
(c) 
(d) 



^1 + 37:2 + :^:i:f3 = 2 
= -77:2 + 37:3 




Answer: 



(a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 
2. In each part, determine whether the equations form a linear system. 



(a) -2x + 4y+z = 2 

(b) x = 4 
2x = 8 

(c) 4x^y + 2z=^l 
-x + (ln2)j|/-3z= 0 

(d) 3z + x=-A 

y + 5z= 1 

—x—y-z= 4 

3. In each part, determine whether the equations form a Unear system. 

(a) 2x1 ^ X4 = 5 
— 7:1+ 5x2 + ^^3 " 2^4= — 1 

(b) sm(27ri+7:3) = /5 

2X2-2X4 J_ 

4:^4 = 4 

(c) 7X1 - X2+ 2X3 = 0 
2x1 + ^2 -^3^4 = 3 
-XI + 5x2 - ^4= - 1 

(d) 2^1+^2 = ^3 + ^4 

Answer: 

(a) and (d) are Unear systems; (b) and (c) are not Unear systems 

4. For each system in Exercise 2 that is Unear, determine whether it is consistent. 

5. For each system in Exercise 3 that is Unear, determine whether it is consistent. 

Answer: 

(a) and (d) are both consistent 

6. Write a system of Unear equations consisting of three equations in three unknowns with 

(a) no solutions. 

(b) exactly one solution. 

(c) infinitely many solutions. 

7. In each part, determine whether the given vector is a solution of the linear system 

2x1 — 4x2 — X3 = 1 
XI -3x2+X3 = 1 
3x1 — 5x2 — 3x3 = 1 

(a) (3, 1, 1) 

(b) (3,-1,1) 



(c) (13, 5, 2) 

(e) (17,7,5) 
Answer: 

(a), (d), and (e) are solutions; (b) and (c) are not solutions 

8. In each part, determine whether the given vector is a solution of the linear system 

x\ + 2x2 — 2x2 = 3 
37:1 -X2 I X2, = 1 
—XI + 57:2-5:^3 = 5 

(^)(f f.o) 

(c) (5, 8, 1) 

<«) (|. f. I) 

(e)(f.f,2) 

9. In each part, find the solution set of the linear equation by using parameters as necessary. 

(a) 7x-5y = 3 

(b) —8x1 + 2x2-5x3 + 6x4=1 



a'^ 8^^ 4* 8 



Answer: 

(a) x = l^ + l 

y = t 

(b) XI 

X2 = r 
X3 = s 

X4 = < 

10. In each part, find the solution set of the linear equation by using parameters as necessary. 

(a) 3x1-5x2 + 4x3=7 

(b) 3v-8w + 2x->' + 4z = 0 

11. In each part, find a system of linear equations corresponding to the given augmented matrix 



(a) 



2 0 0 

3-4 0 
0 1 1 



(b) 



(d) 



0 
1 

-2 

2 1 



(c) [7 2 1 -3 5] 
[l 2 4 0 ij 



0 0 
0 0 



-2 5 
4 -3 

1 7 

-3 5 
0 

0 7 
0 -2 

0 3 

1 4 



Answer: 



= 0 



(a) 2X1 
3x1 - 4x2 = 0 

X2 = 1 

(b) 3x1 - 2x2 = 5 
7x1 + '2 + = —3 

—2x2 + X3 = 7 

(c) 7x1 + 2x2 + X3 - 3x4 • 



5 
1 



XI + 2x2 + 4x3 
(d) ^1 =7 

X2 2 

X3 = 3 
X4 = 4 

12. In each part, find a system of linear equations corresponding to the given augmented matrix. 



(a) 



(b) 



(c) 



2 -1 
-4 -6 

1 -1 

3 0 

0 3-1-1 -1 

5 2 0 -3 -6 

12 3 4 
-4 -3 -2 -1 

5-611 
-8 0 0 3 



(d) 



3 0 1 

-4 0 4 

-13 0 

0 0 0 



-4 3 

1 -3 

-2 -9 

-1 -2 



13. In each part, find the augmented matrix for the given system of linear equations. 



(a) -2x1 = 6 

3x1 = 8 

9x1= - 3 

(b) 6x1 -jr2 I 3x3 = 4 

5x2 - X3 = 1 

(c) 2x2 -3x4+ ^5 = 
— 3X1 - X2 + X3 = 

6x1 + 2x2 — '3 + 2x4 ~ 3x5 ~ 

(d) ^1-^5 = 7 



0 
-1 

6 



Answer: 



(a) 

(b) 
(c) 



-2 6 
3 8 
9 -3 



[6 -1 3 4] 

[O 5 -1 ij 



0 2 0-310 
-3-1 1 0 0-1 
6 2-1 2-3 6 

(d) [1 0 0 0 -1 7] 

14. In each part, find the augmented matrix for the given system of linear equations. 

(a) 3^:1 -2x2 = - 1 
47:1+5x2 = 3 
7x1 + 3x2 = 2 

(b) 2x1 +2x3=1 
3x1 — X2+4x3=7 
6xi+X2- X3 = 0 

(c) XI + 2x2 " X4 + X5 = l 

3x2+ ^3 -X5 = 2 
X3 + 7x4 = 1 

(d) XI = 1 

X2 =2 
X3 = 3 

15. The curve y = ax + 6x + c shown in the accompanying figure passes through the points 

yi)' (^2> yi)' (^3» y^)' ^bow that the coefficients a, b, and c are a solution of the system of 
linear equations whose augmented matrix is 



Xi XI 
^3 



r2 
^2 



t2 
^3 



1 y\ 

1 y2 



Ui.>i) 




X 



Figure Ex-15 



16. Explain why each of the three elementary row operations does not affect the solution set of a linear system. 

17. Show that if the linear equations 



have the same solution set, then the two equations are identical (i.e., ^ = 1 and c = d)- 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) A linear system whose equations are all homogeneous must be consistent. 
Answer: 

True 

(b) Multiplying a linear equation through by zero is an acceptable elementary row operation. 
Answer: 

False 

(c) The linear system 



cannot have a unique solution, regardless of the value of k. 

Answer: 

True 

(d) A single linear equation with two or more unknowns must always have infinitely many solutions. 
Answer: 

True 

(e) If the number of equations in a linear system exceeds the number of unknowns, then the system must be 



x—y = 3 
2x ^2y = k 



inconsistent. 



Answer: 



False 



(f) If each equation in a consistent linear system is multiplied through by a constant c, then all solutions to the 
new system can be obtained by multiplying solutions from the original system by c. 

Answer: 

False 

(g) Elementary row operations permit one equation in a linear system to be subtracted from another. 
Answer: 

True 

(h) The linear system with corresponding augmented matrix 




is consistent. 
Answer: 

False 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.2 Gaussian Elimination 



In this section we will develop a systematic procedure for solving systems of linear equations. The procedure is based on 
the idea of performing certain operations on the rows of the augmented matrix for the system that simplifies it to a form 
from which the solution of the system can be ascertained by inspection. 



Considerations in Solving Linear Systems 

When considering methods for solving systems of linear equations, it is important to distinguish between large systems 
that must be solved by computer and small systems that can be solved by hand. For example, there are many applications 
that lead to linear systems in thousands or even millions of unknowns. Large systems require special techniques to deal 
with issues of memory size, roundoff errors, solution time, and so forth. Such techniques are studied in the field of 
numerical analysis and will only be touched on in this text. However, almost all of the methods that are used for large 
systems are based on the ideas that we will develop in this section. 



Echelon Forms 

In Example 6 of the last section, we solved a linear system in the unknowns x, y, and z by reducing the augmented matrix 
to the form 

"10 0 1" 
0 10 2 
0 0 13 

from which the solution ^ = \-> y = 2-> z = 3 became evident. This is an example of a matrix that is in reduced row 
echelon form. To be of this form, a matrix must have the following properties: 

1. If a row does not consist entirely of zeros, then the first nonzero number in the row is a 1 . We call this a leading 1. 

2. If there are any rows that consist entirely of zeros, then they are grouped together at the bottom of the matrix. 

3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the lower row occurs farther to the 
right than the leading 1 in the higher row. 

4. Each column that contains a leading 1 has zeros everywhere else in that column. 

A matrix that has the first three properties is said to be in row echelon form. (Thus, a matrix in reduced row echelon 
form is of necessity in row echelon form, but not conversely.) 

EXAMPLE 1 Row Echelon and Reduced Row Echelon Form ^ 



The following matrices are in reduced row echelon form. 



1 


0 


0 


4" 




"1 


0 


0" 


0 


1 


0 


7 




0 


1 


0 


0 


0 


1 


-1 




0 


0 


1 



0 


1 


-2 


0 


1 


0 


0 


0 


1 


3 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 



0 0 
0 0 



The following matrices are in row echelon form but not reduced row echelon form. 



14-37 
0 1 6 2 
0 0 15 



1 1 0 
0 1 0 
0 0 0 



0 12 6 0 
0 0 1-10 
0 0 0 0 1 



EXAMPLE 2 More on Row Echelon and Reduced Row Echelon Form -4 



As Example 1 illustrates, a matrix in row echelon form has zeros below each leading 1, whereas a matrix in 
reduced row echelon form has zeros below and above each leading 1 . Thus, with any real numbers substituted for 
the *'s, all matrices of the following types are in row echelon form: 



I * :*: :4c 

0 0 1* 

0 0 0 1 



■[ * * 3*C 

0 1** 

0 0 1* 

0 0 0 0 



I :+::+: :4c 

0 1** 
0 0 0 0 

noon 



0 


1 


* 










3*C 






0 


0 


0 


1 














0 


0 


0 


0 


1 












0 


0 


0 


0 


0 


1 










0 


0 


0 


0 


0 


0 


0 


0 


1 


:+: 



All matrices of the following types are in reduced row echelon form: 



1 0 0 
0 1 0 
0 0 1 



0 0 0 1 



1 0 0 

0 1 0 

0 0 1 

0 0 0 



10** 
0 1** 
0 0 0 0 

0 0 0 0 



0 


1 


* 


0 


0 


0 


* 




0 


* 


0 


0 


0 


1 


0 


0 


* 




0 


* 


0 


0 


0 


0 


1 


0 


* 




0 


* 


0 


0 


0 


0 


0 


1 


* 




0 


* 


0 


0 


0 


0 


0 


0 


0 


0 


1 


* 



If, by a sequence of elementary row operations, the augmented matrix for a system of linear equations is put in reduced 
row echelon form, then the solution set can be obtained either by inspection or by converting certain linear equations to 
parametric form. Here are some examples. 

In Example 3 we could, if desired, express the 
solution more succinctly as the 4-tuple (3, -1, 0, 5). 



EXAMPLE 3 Unique Solution < 



Suppose that the augmented matrix for a linear system in the unknowns xi, X2, X3, and X4 has been reduced 
by elementary row operations to 

^1 0 0 0 3" 

0 10 0-1 
0 0 10 0 
0 0 0 1 5 

This matrix is in reduced row echelon form and corresponds to the equations 

^1 = 3 

X2 = -1 

7:3 =0 
X4 = 5 

Thus, the system has a unique solution, namely, x\=3,X2 = — 1, = 0, 7:4 = 5. 



EXAMPLE 4 Linear Systems in Three Unknowns M 



In each part, suppose that the augmented matrix for a linear system in the unknowns x, j, and z has been 
reduced by elementary row operations to the given reduced row echelon form. Solve the system. 





"1 


0 


0 


0" 




"1 


0 


3 


-1" 




"l 


-5 


1 


4 


(a) 


0 


1 


2 


0 


(b) 


0 


1 


-4 


2 


(c) 


0 


0 


0 


0 




0 


0 


0 


1 




0 


0 


0 


0 




0 


0 


0 


0 



Solution 

(a) The equation that corresponds to the last row of the augmented matrix is 

0.T I Oy I Oz = 1 

Since this equation is not satisfied by any values of x, y, and z, the system is inconsistent. 

(b) The equation that corresponds to the last row of the augmented matrix is 

0x + 07 + 0z = 0 

This equation can be omitted since it imposes no restrictions on x, y, and z; hence, the linear system 
corresponding to the augmented matrix is 

X +3z = -1 

y^4z = 2 

Since x andj^ correspond to the leading I's in the augmented matrix, we call these the leading 
variables. The remaining variables (in this case z) are called free variables. Solving for the leading 
variables in terms of the free variables gives 

x = -\-3z 
y = 2^4z 

From these equations we see that the free variable z can be treated as a parameter and assigned an 
arbitrary value, t, which then determines values for x and j;. Thus, the solution set can be represented 
by the parametric equations 

x = -\-3t, y = 2^At, z = t 

By substituting various values for t in these equations we can obtain various solutions of the system. 
For example, setting ^ = 0 yields the solution 

^=-1, 7 = 2, z = 0 

and setting i — \ yields the solution 

x= - A, 7 = 6, z=l 

(c) As explained in part (b), we can omit the equations corresponding to the zero rows, in which case the 
linear system associated with the augmented matrix consists of the single equation 

x-5y^z = A (1) 

from which we see that the solution set is a plane in three-dimensional space. Although 1 is a valid 
form of the solution set, there are many applications in which it is preferable to express the solution 
set in parametric form. We can convert 1 to parametric form by solving for the leading variable x in 
terms of the free variables y and z to obtain 

7r=4 + 57— z 

From this equation we see that the free variables can be assigned arbitrary values, say y = s and z = L 
which then determine the value of x. Thus, the solution set can be expressed parametrically as 



x=4 + 5s — t, y = s, z = t 



(2) 



We will usually denote parameters in a 
general solution by the letters r,s,t,..., but 
any letters that do not conflict with the names 
of the unknowns can be used. For systems 
with more than three unknowns, subscripted 
letters such as t\, t2, ^3,. • • are convenient. 



Formulas, such as 2, that express the solution set of a linear system parametrically have some associated terminology, 
r n 

DEFINITION 1 

If a linear system has infmitely many solutions, then a set of parametric equations from which all solutions can 
be obtained by assigning numerial values to the parameters is called a general solution of the system. 



Elimination Methods 



We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row 
echelon form. Now we will give a step-by-step elimination procedure that can be used to reduce any matrix to reduced 
row echelon form. As we state each step in the procedure, we illustrate the idea by reducing the following matrix to 
reduced row echelon form. 

0 0 -2 0 7 12 
2 4 -10 6 12 28 
2 4 -5 6 -5 -1 

Step 1. Locate the leftmost column that does not consist entirely of zeros. 



0 


0 -2 0 7 


12 


2 


4 -10 6 12 


28 


2 


4 -5 6 -5 


-1 


1_ 


Leftmost nonzero column 





Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in 
Step 1. 

"2 4 -10 6 12 28" 
00 —20 7 12^ The first and second rows in the preceding matrix were interchanged 

2 4 -5 6 -5 -1 



Step 3. If the entry that is now at the top of the column found in Step 1 is a, multiply the first row by \la in order to 
introduce a leading 1 . 

1 2 -5 3 6 14" 



0 0 -2 0 7 12 
2 4 -5 6 -5 -1 



• The first row of the preceding matrix was multiplied by 



Step 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros. 
1 2 -5 3 6 14" 

0 0 —2 0 7 12 < 2 times the first row of the preceding matrix was added to the third row. 

0 0 5 0 -17 -29 

Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the submatrix that remains. Continue 
in this way until the entire matrix is in row echelon form. 



1 


2 


-5 


3 


6 


14 


0 


0 




0 


7 


12 


0 


0 


5 


0 


-17 


-29 



Lefrmo^ non/cro column 
in the 5iuhniatn\ 



"1 


2 


-5 


3 


6 


14' 


0 


0 


1 


0 


7 


-6 


_0 


0 


5 


0 


-17 


-29_ 



Tlie firsi ro>*' in the submatrix v^^s 
multiplied by — | to intriKlucc a leading 1. 



1 


2 


-5 


3 


6 


14 


0 


0 


1 


0 


7 
2 


-6 


_0 


0 


0 


0 


1 


!_ 


'1 


2 


-5 


3 


6 


14" 


0 


0 


1 


0 


7 
2 


-6 


_() 


0 


0 


0 


1 

7 


l_ 



-5 tunes the first rovv of the submatrLx 
was added to the second of the 
submiitrix to introduce a zero below the 
leading I . 



The top row in the subniatnx was co\cicd, 
and we returned again to Step 1. 



Leftmost non/ero column 
in the new submatrix 



1 


2 


-5 


3 


6 


14 


0 


0 


1 


0 


7 
■> 


-6 


0 


0 


0 


0 


1 


2 



■ The first (and only) row m the new 

submatrix was multiplied by 2 lo introduce 
a leading 1 . 

The entire matrix is now in row echelon form. To find the reduced row echelon form we need the following additional 
step. 

Step 6. Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above 
to introduce zeros above the leading I's. 



0 0 

1 2 
0 0 
0 0 



-5 3 6 

1 0 0 
0 0 1 



14 
1 

2 



-5 3 0 2 
10 0 1 

0 0 12 

0 3 0 7 
10 0 1 



- toes the tod row of the precedmg matrix was added to the second row. 



— 6 times the third row was added to the first row. 



^ 5 times the second row was added to the first row. 



0 0 0 0 1 2 

The last matrix is in reduced row echelon form. 



The procedure (or algorithm) we have just described for reducing a matrix to reduced row echelon form is called Gauss- 
Jordan elimination. This algorithm consists of two parts, a forward phase in which zeros are introduced below the 
leading I's and then a backward phase in which zeros are introduced above the leading I's. If only the forward phase is 
used, then the procedure produces a row echelon form only and is called Gaussian elimination. For example, in the 
preceding computations a row echelon form was obtained at the end of Step 5. 




Carl Friedrich Gauss (1777-1855) 




Wilhelm Jordan (1842-1899) 

Historical Note Although versions of Gaussian elimination were known much earlier, the power of the method 
was not recognized until the great German mathematician Carl Friedrich Gauss used it to compute the orbit of 
the asteroid Ceres from limited data. What happened was this: On January 1, 1801 the Sicilian astronomer 
Giuseppe Piazzi (1746-1826) noticed a dim celestial object that he believed might be a "missing planet." He 
named the object Ceres and made a limited number of positional observations but then lost the object as it neared 
the Sun. Gauss undertook the problem of computing the orbit from the limited data using least squares and the 
procedure that we now call Gaussian elimination. The work of Gauss caused a sensation when Ceres reappeared 



a year later in the constellation Virgo at almost the precise position that Gauss predicted! The method was further 
popularized by the German engineer Wilhelm Jordan in his handbook on geodesy (the science of measuring 
Earth shapes) entitled Handbuch der Vermessungskunde and published in 1888. 
{Images: Granger Collection (Gauss); wikipedia (Jordan)^ 



EXAMPLES Gauss-Jordan Elimination A 



Solve by Gauss-Jordan elimination. 

2x1-1-6x2 — 5x3— 2x4 + 4xj— 3x6=— 1 
5x3 + 10x4 +15x6= 5 
2x1 + 6x2 +8x4 + 4x5+18x6= 6 



Solution The augmented matrix for the system is 

13-2 0 



2 6 
0 0 
2 6 



2 

-2 4 
10 0 
8 4 



0 
-3 
15 
18 



0 
-1 

5 
6 



Adding - 



-2 times the first row to the second and fourth rows gives 

1 3 -2 0 2 0 0 

0 0 -1 -2 0 -3 -1 

0 0 5 10 0 15 5 

0 0 4 8 0 18 6 



Multiplying the second row by —1 and then adding -5 times the new second row to the third row and -4 
times the new second row to the fourth row gives 

13-202 
0 0 12 0 
0 0 0 0 0 



0 0 0 0 0 



0 0 

3 1 

0 0 

6 2 



Interchanging the third and fourth rows and then multiplying the third row of the resulting matrix by — 

6 

gives the row echelon form 

13-2020 0 

1 2 0 

0 0 0 



0 0 
0 0 
0 0 



0 0 0 



This completes the forward phase since there are zeros below the leading I's 



Adding -3 times the third row to the second row and then adding 2 times the second row of the resulting 
matrix to the first row yields the reduced row echelon form 

1 3 0 4 2 0 O' 



0 12 0 0 
0 0 0 0 1 
0 0 0 0 0 



This completes the backward phase since there are zeros above the leading I's 



The corresponding system of equations is 



7:1 + 37:2 +47:4 + 2x5 = 0 
7:3 + 27:4 = 0 



(3) 

Note that in constructing the hnear system in 
3 we ignored the row of zeros in the 
corresponding augmented matrix. Why is this 
justified? 

Solving for the leading variables we obtain 

7:1 = — 37:2 — 47:4 — 2x^ 
7:3 = — 27:4 

Finally, we express the general solution of the system parametrically by assigning the free variables X2, X4, 
and X5 arbitrary values r, s, and t, respectively. This yields 

x\= — 3r — As —2t, X2 = r, 7:3 =— 2^, x^ = s, x^ = t, 7:5 = y 



Homogeneous Linear Systems 

A system of linear equations is said to be homogeneous if the constant terms are all zero; that is, the system has the form 

anxi^ai2X2'^- + ^ln^n =0 

<a(2i7:i +<322^2 + --- + -»2m^m =0 
: : : : 

1 + <Xm2X2 + - + amyiXn = 0 

Every homogeneous system of linear equations is consistent because all such systems have = 0,X2 = 0,...,Xyi = 0siS 
a solution. This solution is called the trivial solution; if there are other solutions, they are called nontrivial solutions. 

Because a homogeneous linear system always has the trivial solution, there are only two possibilities for its solutions: 

• The system has only the trivial solution. 

• The system has infinitely many solutions in addition to the trivial solution. 

In the special case of a homogeneous linear system of two equations in two unknowns, say 

a\x -\-b\y = 0 (^^i, b\ not both zero) 

^27: + = 0 («2j *2 i^ot both zero) 

the graphs of the equations are lines through the origin, and the trivial solution corresponds to the point of intersection at 
the origin (Figure 1.2.1). 



V 

-► 



Only the trivial solution 



X 



and 



Infinitely many 
solutions 



Figure 1.2.1 



There is one case in which a homogeneous system is assured of having nontrivial solutions — namely, whenever the 
system involves more unknowns than equations. To see why, consider the following example of four equations in six 
unknowns. 



EXAMPLE 6 A Homogeneous System M 



Use Gauss- Jordan elimination to solve the homogeneous linear system 

x\^3x2 — 2x2 ♦ =0 

2x1 + ^^2 — ^^3 — 2x4 + ^^5 ~ ^^6 = 0 
57:3+10^:4 +15:^6 = 0 
27:1 + 6x2 f 87:4 + 47:5 + \Sxe = 0 



Solution Observe first that the coefficients of the unknowns in this system are the same as those in 
Example 5; that is, the two systems differ only in the constants on the right side. The augmented matrix for 
the given homogeneous system is 



1 


3 


-2 


0 


2 


0 


0 


2 


6 


-5 


-2 


4 


-3 


0 


0 


0 


5 


10 


0 


15 


0 


2 


6 


0 


8 


4 


18 


0 



(5) 



which is the same as the augmented matrix for the system in Example 5, except for zeros in the last 
column. Thus, the reduced row echelon form of this matrix will be the same as that of the augmented 
matrix in Example 5, except for the last column. However, a moment's reflection will make it evident that 
a column of zeros is not changed by an elementary row operation, so the reduced row echelon form of 5 is 

"13 0 4 

0 0 12 

0 0 0 0 

0 0 0 0 

The corresponding system of equations is 

7:1 + 37:2 +47:4 + 27:5 =0 
7:3 + 27:4 = 0 

X6 = 0 

Solving for the leading variables we obtain 

7:1= —37:2—47:4—27:5 

X3= - 2x4 (7) 
7:6 = 0 

If we now assign the free variables X2, X4, and X5 arbitrary values r, s, and t, respectively, then we can 



2 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 



(6) 



express the solution set parametrically as 

x\= —3r — 4s—2t, X2 = r, 7:3=— 2^, X4 = s, x^ = t, x^ = 0 
Note that the trivial solution results when r = s = t = 0- 



Free Variable in Homogeneous Linear Systems 

Example 6 illustrates two important points about solving homogeneous linear systems: 

1. Elementary row operations do not alter columns of zeros in a matrix, so the reduced row echelon form of the 
augmented matrix for a homogeneous linear system has a final column of zeros. This implies that the linear system 
corresponding to the reduced row echelon form is homogeneous, just like the original system. 

2. When we constructed the homogeneous linear system corresponding to augmented matrix 6, we ignored the row of 
zeros because the corresponding equation 

Oxi + 0:^2 + 0x2 + 0^4 + Ox^ + 0:^6 = 0 

does not impose any conditions on the unknowns. Thus, depending on whether or not the reduced row echelon form 
of the augmented matrix for a homogeneous linear system has any rows of zero, the linear system corresponding to 
that reduced row echelon form will either have the same number of equations as the original system or it will have 
fewer. 



Now consider a general homogeneous linear system with n unknowns, and suppose that the reduced row echelon form of 
the augmented matrix has r nonzero rows. Since each nonzero row has a leading 1, and since each leading 1 corresponds 
to a leading variable, the homogeneous system corresponding to the reduced row echelon form of the augmented matrix 
must have r leading variables and « — ^ free variables. Thus, this system is of the form 

^k, + E() = o 

^2 + E0 = o 

+ 1:0 = 0 

where in each equation the expression V] Q denotes a sum that involves the free variables, if any [see 7, for example]. In 
summary, we have the following result. 



THEOREM 1.2.1 Free Variable Theorem for Homogeneous Systems 

If a homogeneous linear system has n unknowns, and if the reduced row echelon form of its augmented matrix 
has r nonzero rows, then the system has n - r free variables. 



Note that Theorem 1.2.2 applies only to 
homogeneous systems — a nonhomogeneous system 
with more unknowns than equations need not be 
consistent. However, we will prove later that if a 
nonhomogeneous system with more unknowns then 
equations is consistent, then it has in infinitely many 
solutions. 



Theorem 1.2.1 has an important implication for homogeneous linear systems with more unknowns than equations. 
Specifically, if a homogeneous linear system has m equations in n unknowns, and if ^ ^, then it must also be true that 
r < « (why?). This being the case, the theorem implies that there is at least one free variable, and this implies in turn that 
the system has infinitely many solutions. Thus, we have the following result. 



THEOREM 1.2.2 

A homogeneous linear system with more unknowns than equations has infinitely many solutions. 



In retrospect, we could have anticipated that the homogeneous system in Example 6 would have infinitely many 
solutions since it has four equations in six unknowns. 



Gaussian Elimination and Bacl<-Substitution 

For small linear systems that are solved by hand (such as most of those in this text), Gauss-Jordan elimination (reduction 
to reduced row echelon form) is a good procedure to use. However, for large linear systems that require a computer 
solution, it is generally more efficient to use Gaussian elimination (reduction to row echelon form) followed by a 
technique known as back-substitution to complete the process of solving the system. The next example illustrates this 
technique. 

EXAMPLE 7 Example 5 Solved by Back-Substitution < 

From the computations in Example 5, a row echelon form of the augmented matrix is 

'1 3 -2 0 2 0 O" 
0 0 1 2 0 3 

0 0 0 0 0 1 
0 0 0 0 0 0 



To solve the corresponding system of equations 

7:1 + 37:2 — 2:^3 +2^:5 =0 
7:3 + 27:4 +37:6= 1 



^6 = T 



we proceed as follows: 

Step 1. Solve the equations for the leading variables. 

7:1 = — 37:2 + 27:3 — 27:5 
7:3 = 1 — 27:4—37:6 
1 

Step 2. Beginning with the bottom equation and working upward, successively substitute each equation 
into all the equations above it. 



Substituting x^ = '^ into the second equation yields 



TTl = — 37:2 H- 2x2 ~ 
X2= —27:4 

Substituting 7:3 = — 2^4 into the first equation yields 

7:1 = — 3^2 — 47:4 — 2x^ 
X2= ^2x4 




Step 3. Assign arbitrary values to the free variables, if any. 



If we now assign X2, X4, and X5 the arbitrary values r, s, and t, respectively, the general solution is given by 
the formulas 

3:1 = — 3r — 45 — 2^, X2 = r, 7:3=— 2^, X4 = s, x^ = t, x^ = ^ 

This agrees with the solution obtained in Example 5. 



EXAMPLE 8 M 



Suppose that the matrices below are augmented matrices for linear systems in the unknowns xi, X2, X3, and 
X4. These matrices are all in row echelon form but not reduced row echelon form. Discuss the existence 
and uniqueness of solutions to the corresponding linear systems 



(a) 



1 


-3 


7 


2 


5 


0 


1 


2 


-4 


1 


0 


0 


1 


6 


9 


0 


0 


0 


0 


1 



(b) 



1 


-3 


7 


2 


5 




1 


-3 


7 


2 


5 


0 


1 


2 


-4 


1 


(c) 


0 


1 


2 


-4 


1 


0 


0 


1 


6 


9 


0 


0 


1 


6 


9 


0 


0 


0 


0 


0 




0 


0 


0 


1 


0 



Solution 

(a) The last row corresponds to the equation 

07:1 + 07:2 + 07:3 + 07:4= 1 
from which it is evident that the system is inconsistent. 

(b) The last row corresponds to the equation 

0:^1 -h 07:2 + 07:3 + 07:4=0 

which has no effect on the solution set. In the remaining three equations the variables xi, X2, and X3 
correspond to leading I's and hence are leading variables. The variable X4 is a free variable. With a 
little algebra, the leading variables can be expressed in terms of the free variable, and the free variable 
can be assigned an arbitrary value. Thus, the system must have infinitely many solutions. 

(c) The last row corresponds to the equation 

7:4= 0 

which gives us a numerical value for X4. If we substitute this value into the third equation, namely, 

7:3 + 67:4= 9 

we obtain 7:3 = 9. You should now be able to see that if we continue this process and substitute the 
known values of X3 and X4 into the equation corresponding to the second row, we will obtain a unique 
numerical value for X2; and if, finally, we substitute the known values of X4, X3, and X2 into the 



equation corresponding to the first row, we will produce a unique numerical value for xi. Thus, the 
system has a unique solution. 



Some Facts About Echelon Forms 



There are three facts about row echelon forms and reduced row echelon forms that are important to know but we will not 
prove: 

1. Every matrix has a unique reduced row echelon form; that is, regardless of whether you use Gauss- Jordan elimination 
or some other sequence of elementary row operations, the same reduced row echelon form will result in the end. 

2. Row echelon forms are not unique; that is, different sequences of elementary row operations can result in different 
row echelon forms. 

3. Although row echelon forms are not unique, all row echelon forms of a matrix A have the same number of zero rows, 
and the leading I's always occur in the same positions in the row echelon forms of A. Those are callled the pivot 
positions of ^. A column that contains a pivot position is called a pivot column of A. 



EXAMPLE 9 Pivot Positions and Columns M 



Earlier in this section (immediately after Definition 1) we found a row echelon form of 



A = 



0 0 
2 4 
2 4 



to be 



1 2 
0 0 
0 0 



-2 0 
-10 6 
-5 6 



-5 3 
1 0 
0 0 



7 12 
12 28 
-5 -1 



6 14 
-I -6 



1 



The leading I's occur in positions (row 1, column 1), (row 2, column 3), and (row 3, column 5). These are 
the pivot positions. The pivot columns are columns 1,3, and 5. 



Roundoff Error and Instability 



There is often a gap between mathematical theory and its practical implementation — Gauss- Jordan elimination and 
Gaussian elimination being good examples. The problem is that computers generally approximate numbers, thereby 
introducing roundoff errors, so unless precautions are taken, successive calculations may degrade an answer to a degree 
that makes it useless. Algorithms (procedures) in which this happens are called unstable. There are various techniques 
for minimizing roundoff error and instability. For example, it can be shown that for large linear systems Gauss-Jordan 
elimination involves roughly 50% more operations than Gaussian elimination, so most computer algorithms are based on 
the latter method. Some of these matters will be considered in Chapter 9. 



Concept Review 

• Reduced row echelon form 

• Row echelon form 

• Leading 1 

• Leading variables 

• Free variables 

• General solution to a linear system 

• Gaussian elimination 

• Gauss-Jordan elimination 

• Forward phase 

• Backward phase 

• Homogeneous linear system 

• Trivial solution 

• Nontrivial solution 

• Dimension Theorem for Homogeneous Systems 

• Back-substitution 

Skills 

• Recognize whether a given matrix is in row echelon form, reduced row echelon form, or neither. 

• Construct solutions to linear systems whose corresponding augmented matrices that are in row echelon form or 
reduced row echelon form. 

• Use Gaussian elimination to find the general solution of a linear system. 

• Use Gauss- Jordan elimination in order to fmd the general solution of a linear system. 

• Analyze homogeneous linear systems using the Free Variable Theorem for Homogeneous Systems. 



Exercise Set 1 .2 



1. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. 



(a) 



(b) 



(c) 



(d) 



1 

0 
0 

"1 
0 
0 

"0 
0 
0 

'1 

0 



I] 



(e) 


'l 


2 


0 


3 


0 




u 


n 


1 
1 


1 
1 


u 




U 


U 


u 


u 


1 




n 


n 

V 


n 


n 


n 


(f) 


U 


u 










n 

u 


n 
u 










0 


0 








(g) 


[I 




7 


5 










1 


3 





Answer: 



(a) Both 

(b) Both 

(c) Both 

(d) Both 

(e) Both 

(f) Both 

(g) Row echelon 

2. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither, 
(a) 



(b) 



(c) 



(d) 



(e) 



(f) 



(g) 



1 


2 


0 






0 


1 


0 






0 


0 


0 






1 


0 


0" 






0 


1 


0 






0 


2 


0 






1 


3 


4" 






0 


0 


1 






0 


0 


0 






1 


5 




3" 




0 


1 




1 




0 


0 




0 




1 


2 


3' 






0 


0 


0 






0 


0 


1_ 






1 


2 


3 


4 


5" 


1 


0 


7 


1 


3 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 




2 


0 


1 


0 




0 


1 


-2 



3. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 
to the given reduced row echelon form. Solve the system. 



(a) 


1 
1 


7 4 


n~ 
/ 










0 


1 2 


2 










n 


n 1 










(b) 


1 


U o - 


-J 


0 








U 


1 >l 


-y 


5 








u 


n 1 
U 1 


1 








(c) 


1 


7 -2 


0 




8 


-3 




n 

u 




1 
1 






J 




U 


U U 


1 




2 






u 


U U 


u 




ft 
J 


u 


(d) 


"1 


-3 7 


r 










0 


1 4 


0 










0 


0 0 


1 









Answer: 

(a) ^^'1 = - 37, X2 = - 8, 7:3 = 5 

(b) ^1 = 13^ — 10, 7:2 = 13^ — 5, = 

(c) ^1 = I 2i:- 11, X2 = s, X2 = 

(d) Inconsistent 



= £ -f 2, X4 = t 

-3l-4, X4= -3^-1-9, 7:5 = £ 



4. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 
to the given reduced row echelon form. Solve the system. 



(a) 


1 


0 0 




-3" 








0 


1 0 




0 








0 


0 1 




7 






(b) 


'1 


0 0 




-7 


8" 






0 


1 0 




3 


2 






0 


0 1 




1 


-5 




(c) 


"1 


-6 


0 


0 


3 -2 




0 


0 


1 


0 


4 


7 




0 


0 


0 


1 


5 


8 




0 


0 


0 


0 


0 


0 


(d) 


1 


-3 


0 


0" 








0 


0 


1 


0 








0 


0 


0 


1 







In Exercises 5-8, solve the linear system by Gauss-Jordan elimination. 

5. ^1 I ^2 + 2x3 = 8 
—7:1 — 2^2 + 37:3 = 1 
37:1-77:2 + 47:3 = 10 



Answer: 



XI = 3, 7:2 = 1, X2 = 2 



6. 27:1 + 27:2 + 27:3 = 0 
—2x1-1-57:2-1-27:3 = 1 

87:147:2147:3 = -1 

7. a: — 7 + 2z— >v=— 1 
27:+ ji/-2z-2w=-2 
^x + 2y-4z+ w = 1 

3x _3w=-3 

Answer: 

7: = ^ — 1, y = 2s, z = s, w = t 
^ -26 + 3c = 1 
3a + 6b-3c = -2 
6a + 6b + 3c = 5 

In Exercises 9-12, solve the linear system by Gaussian elimination. 

9. Exercise 5 
Answer: 

7:1 = 3, 7:2 = 1, 7:3 = 2 

10. Exercise 6 

11. Exercise 7 

Answer: 

7: = ^ — 1 , y = 2s, z = s, w = t 

12. Exercise 8 

In Exercises 13-16, determine whether the homogeneous system has nontrivial solutions by inspection (without pencil 
and paper). 

13. 27:1-37:2 I 4x3- 7:4 = 0 
77:1+ 7:2-87:3 I 9x4 = 0 

27:1 + 87:2+ X3— X4 = 0 

Answer: 

Has nontrivial solutions 

14. 7:1 + 37:2- :f 3 = 0 

X2 - 87:3 = 0 

4x3 = 0 

15. (211X1 I a\2^2 * ^^13^3 = 0 
^21^1 + ^22^2 + ^23^3 = 0 

Answer: 

Has nontrivial solutions 

16. 3x1-2x2 = 0 
6x1-47:2 = 0 



In Exercises 17-24, solve the given homogeneous linear system by any method. 

17. 2^1 I K2-^3x2 = 0 

XI + 2x2 = 0 
X2+ X2 = 0 

Answer: 

^1 — ^^ ^2 = 0, = 0 

18. 2x- y-3z = 0 
-x + 2y^3z = 0 

x+y + 4z = 0 

19. 3x1 +^2 + ^3 + ^4 = 0 
5x1 —X2 + 7:3 — 7:4=0 

Answer: 

XI = — s, X2= —t^s, 7:3 = 4s, x/^ = t 

20. v + Sw — 27: = 0 
2«H-v-4w + 37: = 0 
2« + 3v + 2w - x = 0 

-4« - 3v + 5w - 47: = 0 

21. 2:^ I 27 } 4z = 0 
VI? — 7 — 3z = 0 

2w + 37:-|-7-|-z = 0 

-2w+ 7: + 3;/-2z = 0 

Answer: 

w = i, x= — y = t, z = 0 

22. 7:1 + 3x2 +X4 = 0 

XI +47:2 4 2^3 = 0 

—27:2 — 2:^3—7:4 = 0 

27:1—47:2-1- X2 I = 0 

XI — 2x2 — X3 -h X4 = 0 



23.2/1-/2 + 3/3+4/4 = 

/l -2/3 + 7/4 = 

3/1-3/2+ /3 + 5/4 = 

2/1+ /2 +4/3 +4/4 = 



9 
11 

8 
10 



Answer: 



24. 



/l=-l, /2 = 0, /3=1, /4 = 2 
Z3+ ^4 + ^5 = 0 



-Zi- Z2 + 2Z3-3Z4 + Z5 = 0 
Zl+ Z2- 2Z3 -Z5 = 0 
2Z1+2Z2- Z3 +Z5 = 0 



In Exercises 25-28, determine the values of a for which the system has no solutions, exactly one solution, or infinitely 



many solutions. 



25. x + 2y- 


3z = 


4 


3x - y + 


5z = 


2 


4x+ y+ (a 


2- 14V = 


a + 2 



Answer: 

If a = 4, there are infinitely many solutions; if a = — 4, there are no solutions; if a ^ ±4, there is exactly one 
solution. 

26. x + 2y+ z = 2 

2x~2y+ 3z = 1 

x + 2yia^-3y = a 

21. 2y = 1 

Answer: 

If 3 = 3, there are infinitely many solutions; if a = — 3? there are no solutions; if a ±3, there is exactly one 
solution. 

28. ^ + + 7z = -7 
2x + 3y+ 17z = -16 

:r + 27+ (df^ + ljz = 3fl 

In Exercises 29-30, solve the following systems, where a, b, and c are constants. 

29. 27:+ y = a 
3x + 6y = b 

Answer: 

3"9"^ -3-1-9 

30. :fl+:f2-l- ^3 = ^ 

2^1 +2^3 = b 

31. Find two different row echelon forms of 



2 7j 



_2 

This exercise shows that a matrix can have multiple row echelon forms. 



Answer: 

and I * ' I are possible answers. 



1 3 

0 1 

32. Reduce 



2 1 3 
0 -2 -29 

3 4 5 

to reduced row echelon form without introducing fractions at any intermediate stage. 
33. Show that the following nonlinear system has 1 8 solutions if 0 < a < 2ir, 0 < 7 < 27r, and 0 < 7 < 2flr. 

sin Q£ + 2 cos /9 + 3 tan 7 = 0 
2 sin Q + 5 cos j9 + 3 tan 7 = 0 
—sin tt — 5 cos /9 + 5 tan 7 = 0 



[Hint: Begin by making the substitutions ;^ = sin ct? y = cos and z = tan .] 

34. Solve the following system of nonlinear equations for the unknown angles a, (3, and y, where 0 < a < 2ir, 
0<^<2.- andO<7<fr. 

2sinQ£— cos .J + 3 tan 7 = 3 
4 sin ck + 2 cos ^ — 2 tan 7 = 2 
6 sin Q£ — 3 cos + tan 7 = 9 



35. Solve the following system of nonlinear equations for x, y, and z. 

2.2, 2 
X +y + z = 

x^-7^-f2z^ = 



[Hint: Begin by making the substitutions X = x^-> ^= Z = z^-] 
Answer: 

7i= z= ±{2 

36. Solve the following system for x, y, and z. 

X y z 
7: 7 z 

37. Find the coefficients a, h, c, and (i so that the curve shown in the accompanying figure is the graph of the equation 
y = ax -\hx ^-cx^-d. 




Figure Ex-37 



Answer: 



fl = l, b= ^6, ^-2, d=\0 
38. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is given by the equation 




(4,-3) 

Figure Ex-38 

39. If the linear system 

a\x + b\y ^c\z = 0 
- b-jy + = 0 
<337: =H i^y — C3Z = 0 

has only the trivial solution, what can be said about the solutions of the following system? 

a\X'\'b\y '\'C\z = 2 

Answer: 

The nonhomogeneous system will have exactly one solution. 

is a 3 X 5 matrix, then what is the maximum possible number of leading I's in its reduced row echelon form? 

(b) If 5 is a 3 X 6 matrix whose last column has all zeros, then what is the maximum possible number of parameters 
in the general solution of the linear system with augmented matrix B? 

(c) If C is a 5 X 3 matrix, then what is the minimum possible number of rows of zeros in any row echelon form of 



C? 



(a) Prove that if — ^ 0? then the reduced row echelon form of 



a b 




"1 0" 




IS 


0 1_ 


c d 



41. 



(b) Use the result in part (a) to prove that ad — be i^^^ then the linear system 

ax ^by = k 
cx+dy = l 

has exactly one solution. 
42. Consider the system of equations 

ax-^by = 0 
cx^dy = 0 
ex-^fy = 0 

Discuss the relative positions of the lines ax by = 0, cx + dy = 0, and gx + /y = 0 when (a) the system has 
only the trivial solution, and (b) the system has nontrivial solutions. 



43. Describe all possible reduced row echelon forms of 



(a) 


a 


b 


c 






d 


e 


s 






g 


h 


i 




(b) 


a 


b 


c 


d 




e 


J 


g 


h 




i 


J 


k 


I 




m 


n 


P 


<1 



True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) If a matrix is in reduced row echelon form, then it is also in row echelon form. 
Answer: 

True 

(b) If an elementary row operation is applied to a matrix that is in row echelon form, the resulting matrix will still be in 
row echelon form. 

Answer: 

False 

(c) Every matrix has a unique row echelon form. 
Answer: 

False 

(d) A homogeneous linear system in n unknowns whose corresponding augmented matrix has a reduced row echelon 
form with r leading I's has n - r free variables. 

Answer: 

True 

(e) All leading I's in a matrix in row echelon form must occur in different columns. 
Answer: 

True 

(f) If every column of a matrix in row echelon form has a leading 1 then all entries that are not leading I's are zero. 
Answer: 

False 

(g) If a homogeneous linear system of n equations in n unknowns has a corresponding augmented matrix with a reduced 
row echelon form containing n leading 1 's, then the linear system has only the trivial solution. 

Answer: 

True 



(h) If the reduced row echelon form of the augmented matrix for a linear system has a row of zeros, then the system must 
have infinitely many solutions. 

Answer: 

False 

(i) If a linear system has more unknowns than equations, then it must have infinitely many solutions. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.3 Matrices and Matrix Operations 

Rectangular arrays of real numbers arise in contexts other than as augmented matrices for linear systems. In this 
section we will begin to study matrices as objects in their own right by defining operations of addition, subtraction, 
and multiplication on them. 



Matrix Notation and Terminology 

In Section 1.2 we used rectangular arrays of numbers, called augmented matrices, to abbreviate systems of linear 
equations. However, rectangular arrays of numbers occur in other contexts as well. For example, the following 
rectangular array with three rows and seven columns might describe the number of hours that a student spent studying 
three subjects during a certain week: 





Mon. 


Tues. 


Wed. 


Thurs. 


Fri. 


Sat. 


Sun. 


Math 


2 


3 


2 


4 


1 


4 


2 


History 


0 


3 


1 


4 


3 


2 


2 


Language 


4 


1 


3 


1 


0 


0 


2 



If we suppress the headings, then we are left with the following rectangular array of numbers with three rows and 
seven columns, called a "matrix": 



2 


3 


2 


4 


1 


4 


2 


0 


3 


1 


4 


3 


2 


2 


4 


1 


3 


1 


0 


0 


2 



More generally, we make the following definition. 

r n 



DEFINITION 1 

A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix. 



A matrix with only one column is called a column 
vector or a column matrix, and a matrix with only 
one row is called a row vector or a row matrix. In 
Example 1 , the 2 x 1 matrix is a column vector, the 
1x4 matrix is a row vector, and the ] x 1 matrix 
is both a row vector and a column vector. 



EXAMPLE 1 Examples of Matrices A 



Some examples of matrices are 



1 2 
3 0 
-1 4 



[2 1 0 -3], 



en — 

oh 

0 0 



1 
3 



[4] 



The size of a matrix is described in terms of the number of rows (horizontal Hnes) and columns (vertical lines) it 
contains. For example, the first matrix in Example 1 has three rows and two columns, so its size is 3 by 2 (written 
3 X 2)- In a size description, the first number always denotes the number of rows, and the second denotes the number 
of columns. The remaining matrices in Example 1 have sizes Ix4?3x3?2xl? ^^d 1x1? respectively. 

We will use capital letters to denote matrices and lowercase letters to denote numerical quantities; thus we might write 

When discussing matrices, it is common to refer to numerical quantities as scalars. Unless stated otherwise, scalars 
will be real numbers; complex scalars will be considered later in the text. 

Matrix brackets are often omitted from ] y \ 
matrices, making it impossible to tell, for example, 
whether the symbol 4 denotes the number "four" or 
the matrix [4]. This rarely causes problems because 
it is usually possible to tell which is meant from the 
context. 



a b c 
d e f 



The entry that occurs in row / and column j of a matrix A will be denoted by ay. Thus a general 3x4 matrix might be 
written as 

'<^n «12 «13 «141 
A= ^21 ^22 ^23 ^2A 
a2\ a22 ct2Q ^34 

and a general my,n matrix as 





^12 • 




^21 


^^22 ' 




: 


: 


: 









(1) 



When a compact notation is desired, the preceding matrix can be written as 

the first notation being used when it is important in the discussion to know the size, and the second being used when 
the size need not be emphasized. Usually, we will match the letter denoting a matrix with the letter denoting its 
entries; thus, for a matrix B we would generally use by for the entry in row / and column 7, and for a matrix C we 



would use the notation c, 



v 



The entry in row / and column j of a matrix A is also commonly denoted by the symbol {A)ij. Thus, for matrix 1 
above, we have 



and for the matrix 



A = 



2 -3 
7 0 



we have (^) u = 2, (A) 12 = - 3, (A)2i = 1, and {A)22 = 0. 



Row and column vectors are of special importance, and it is common practice to denote them by boldface lowercase 
letters rather than capital letters. For such matrices, double subscripting of the entries is unnecessary. Thus a general 
1 X « row vector a and a general m x\ column vector b would be written as 



(Stfj] and b = 



b2 
b 



m 



A matrix A with n rows and n columns is called a square matrix of order n, and the shaded entries an, a22, - - <^nn 
in 2 are said to be on the main diagonal of A. 



a\\ a\2 
(i2\ CI22 

dnl Clnl 



Cl\n 



(2) 



Operations on Matrices 

So far, we have used matrices to abbreviate the work in solving systems of linear equations. For other applications, 
however, it is desirable to develop an "arithmetic of matrices" in which matrices can be added, subtracted, and 
multiplied in a useful way. The remainder of this section will be devoted to developing this arithmetic. 

n 

DEFINITION 2 

Two matrices are defined to be equal if they have the same size and their corresponding entries are equal. 
L J 

The equality of two matrices 

A=[a^j] and B=[b^j] 

of the same size can be expressed either by writing 
(A),j = (B)ij 

or by writing 

where it is understood that the equalities hold for 
all values of / and j. 



EXAMPLE 2 Equality of Matrices A 



Consider the matrices 



A = 



'2 r 




'2 r 




'2 1 0" 


, c = 


3 X 

_ 




3 5_ 




3 4 0_ 



If ;]f = 5, then ^ = 5, but for all other values of x the matrices A and B are not equal, since not all of 
their corresponding entries are equal. There is no value of x for which A=C since A and C have 
different sizes. 



r 



DEFINITION 3 

If A and B are matrices of the same size, then the sum ^4 + £ is the matrix obtained by adding the entries of B 
to the corresponding entries of A, and the difference ^ — 5 is the matrix obtained by subtracting the entries of 
B from the corresponding entries of A. Matrices of different sizes cannot be added or subtracted. 



In matrix notation, if i4 = [cLij] and B = [bjj] have the same size, then 

(A + B) ij = (A) ij + (B) ij = aij + bij and (A - B) y = {A) ^ - (5) y = ^jy - 

EXAMPLES Addition and Subtraction < 



Consider the matrices 





2 


1 


0 


3" 




' -A 


3 


5 


r 




A = 


-1 


0 


2 


4 


, B = 


2 


2 


0 


-1 


, C 




4 


-2 


7 


0 




3 


2 


-4 


5 





Then 



A-¥B = 



-2 

1 

7 



and ^-5 = 



6 
-3 

1 



1 1 



-2 -5 2 

-2 2 5 
-4 11-5 



The expressions A-\- C, B -\- C, A — C, and 5 — C are undefined. 



r 



DEFINITION 4 



If A is any matrix and c is any scalar, then the product cA is the matrix obtained by multiplying each entry of 
the matrix A by c. The matrix cA is said to be a scalar multiple of A. 



In matrix notation, if ^4= [<3y] , then 

{cA)ij = c{A)i^ = caij 



EXAMPLE 4 Scalar Multiples M 



For the matrices 



we have 



A = 



2 3 4 
1 3 1 



5 = 



0 2 7 
-1 3 -5 



C = 



2A^ 



4 6 8 
2 6 2 



(-1)5 = 



0 -2 -7 

1 -3 5 



9-6 3 
3 0 12 



3 -2 1 
1 0 4 



It is common practice to denote (- l)B by -B. 



Thus far we have defined multiplication of a matrix by a scalar but not the multiplication of two matrices. Since 
matrices are added by adding corresponding entries and subtracted by subtracting corresponding entries, it would 
seem natural to define multiplication of matrices by multiplying corresponding entries. However, it turns out that such 
a definition would not be very useful for most problems. Experience has led mathematicians to the following more 
usefiil definition of matrix multiplication. 

r n 



DEFINITION 5 

If J is an X r matrix and S is an ^ x « matrix, then the product AB is the ^ x « matrix whose entries are 
determined as follows: To find the entry in row / and column j of AB, single out row / from the matrix A and 
column j from the matrix B. Multiply the corresponding entries from the row and column together, and then 
add up the resulting products. 



EXAMPLES Multiplying Matrices -4 



Consider the matrices 



12 4 

2 6 0 



4 14 3 
0-131 
2 7 5 2 



Since ^ is a 2 x 3 matrix and 5 is a 3 x 4 matrix, the product AB is a 2 x 4 matrix. To determine, for 
example, the entry in row 2 and column 3 of AB, we single out row 2 from A and column 3 from B. 
Then, as illustrated below, we multiply corresponding entries together and add up these products. 

'4 14 3' 
0-131 

2 7 5 2 



□ □ 

n n 



□ □ 
□ 



26 



(2-4) + (6-3) + (0-5) = 26 

The entry in row 1 and column 4 of AB is computed as follows: 



'12 4 
2 6 0 



4 14 3 
0-131 
2 7 5 2 

(1-3) 



The computations for the remaining entries are 

(1.4) 4- (2.0) + (4.2) = 12 
(1.1) -(2.1) + (4.7) = 27 
(1.4) + (2.3) + (4.5) = 30 
(2.4) + (6.0) + (0.2) = 8 
(2.1) -(6.1) + (0.7) =-4 
(2.3) + (6.1) + (0.2) = 12 



□ □ □ Ql] 

□ □ □ □ 
(2 • 1) + (4 • 2) = 13 



12 27 30 13' 
8 -4 26 12 



The definition of matrix multiplication requires that the number of columns of the first factor A be the same as the 
number of rows of the second factor B in order to form the product AB. If this condition is not satisfied, the product is 
undefined. A convenient way to determine whether a product of two matrices is defined is to write down the size of 
the first factor and, to the right of it, write down the size of the second factor. If, as in 3, the inside numbers are the 
same, then the product is defined. The outside numbers then give the size of the product. 



A 

X 



I Inside | 



Outside 



B 

X 



// = 



AB 
m X n 



(3) 




Gotthold Eisenstein (1823-1852) 



Historical Note The concept of matrix multiplication is due to the German mathematician Gotthold 
Eisenstein, who introduced the idea around 1 844 to simplify the process of making substitutions in linear 
systems. The idea was then expanded on and formalized by Cayley in his Memoir on the Theory of Matrices 
that was published in 1858. Eisenstein was a pupil of Gauss, who ranked him as the equal of Isaac Newton 
and Archimedes. However, Eisenstein, suffering from bad health his entire life, died at age 30, so his potential 
was never realized. 
[Image: wikipedia] 



EXAMPLE 6 Determining Wlietlier a Product Is Defined < 



Suppose that^, B, and C are matrices with the following sizes: 

ABC 
3x4 4x7 7x3 

Then by 3, AB is defined and is a 3 x 7 matrix; BC is defined and is a 4 x 3 matrix; and CA is defined 
and is a 7 X 4 matrix. The products AC, CB, and BA are all undefined. 



In general, if -4 = [ajj ] is an ^ x r matrix and B = [bij ] is an ^ x « matrix, then, as illustrated by the shading in 4, 

an a\2 ' ' ' air 



AB = 



^2r 



^21 ^22 ' ' 

^il ^i2 ' ' ' <^ir 

<^ml <^m2 ' ' ' ^mr 
the entry (AB) jj in row / and column j of AB is given by 

(AB)ij = anbij + ai2b2j + tat 13^3; + 



^11 A12 
hi h2 



brj 



+ ai^b 



(4) 



(5) 



Partitioned Matrices 



A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal and vertical rules between 
selected rows and columns. For example, the following are three possible partitions of a general 3 x 4 matrix A — the 
first is a partition of A into four submatrices A\\, Au, A21, and ^22; the second is a partition of A into its row vectors 
ri, r2, and rs; and the third is a partition of A into its column vectors ci, C2, C3, and C4: 

'^11 <^\2 <^\4~ 
A= ci2\ a22 <^2Z <^24 
a3i a22 <^33 <^34 



An A12 
^21 ^22 



A = 



A = 



^11 ^12 ^13 ^14 

^2\ ^22 <^23 <^2A 

<^3\ <^32 <^33 <^34 

'a\\ a\2 ayi a\/i, 

^21 ^22 <^23 ^24 

a2i a22 ^33 <^34 



r2 
1-3 



= [ci C2 C3 C4] 



Matrix Multiplication by Columns and by Rows 

Partitioning has many uses, one of which is for finding particular rows or columns of a matrix product AB without 
computing the entire product. Specifically, the following formulas, whose proofs are left as exercises, show how 
individual column vectors of AB can be obtained by partitioning B into column vectors and how individual row 
vectors of AB can be obtained by partitioning A into row vectors. 



AB = A[hi b2 • • • b„] = [^bi Ah2 ••• ^b„] 
(AB computed column by column) 



(6) 











AB = 




B = 













(7) 



i^AB computed row by row) 



In words, these formulas state that 

j th column vector oiAB = A[j^ column vector of 5] 



(8) 



i th row vector of AB = [i th row vector of j4] 5 

EXAMPLE 7 Example 5 Revisited < 

If A and B are the matrices in Example 5, then from 8 the second column vector of AB can be obtained 
by the computation 

" 1 



(9) 



1 2 4 

2 6 0 



-1 

7 



27 
-4 



T T 

S e c ond c olumn of B S e c ond c olumn of AB 
and from 9 the first row vector of AB can be obtained by the computation 

"4 1 4 3 
[i 2 4] 0 -1 3 1 
2 7 5 2 



Firsi row qU\ 



= [12 27 30 13] — 

o\/\B 1 



First row 1 



Matrix Products as Linear Combinations 

We have discussed three methods for computing a matrix product AB — entry by entry, column by column, and row by 
row. The following definition provides yet another way of thinking about matrix multiplication. 

r n 



DEFINITION 6 



IfAi, A2, -i4;. are matrices of the same size, and if ci, C2, c,. are scalars, then an expression of the 



form 

C1A1+C2A2 + ' ' ' +CrAr 

is csillQd Si linear combination of A\, A2, Aj. with coefficients c\, C2, . Cy. 

L J 

To see how matrix products can be viewed as linear combinations, let ^ be an ^ x « matrix and x an ^ x 1 column 
vector, say 







«12 • 








A = 


"321 


021 ' 


■ • «2« 


and x = 








ami • 


amn 







Then 









'»12^2 


+ • 


• + 






'an ' 




'an' 






'a\n' 


A = 


"321^1 


+ 


'»22^2 




• + 






a2\ 




a22 


+ • • 




a2n 








^m2^2 




• + 






am\ 




am2 






amn 



(10) 



This proves the following theorem. 
THEOREM 1.3.1 

If A is arifnxn i^citrix, and ifx is an ^xl column vector, then the product Ax can be expressed as a linear 
combination of the column vectors of A in which the coefficients are the entries of x. 

EXAMPLE 8 Matrix Products as Linear Combinations A 

The matrix product 



1 


3 


2" 


2 




r 


1 


2 


-3 


-1 




-9 


2 


1 


-2 


3 




-3 



can be written as the following linear combination of column vectors 



-1 




3 




2 




1 


1 


-1 


2 


+ 3 


-3 




-9 


2 




1 




-2 




-3 



EXAMPLES Columns of a Product /AS as Linear Combinations 



We showed in Example 5 that 



AB = 



1 2 4 

2 6 0 



1 4 3 
-1 3 1 

7 5 2 



12 27 30 13 
8 -4 26 12 



It follows from Formula 6 and Theorem 1.3.1 that the j th column vector oiAB can be expressed as a 
linear combination of the column vectors of A in which the coefficients in the linear combination are the 
entries from the j th column of S. The computations are as follows: 



12 
8 

27 
-4 

30 
26 



= 4 



+ 0 



]-[o] 



= 4 



13 




1 




2 


+ 2 


4 


= 3 


+ 






12 




2 




6 


0 



Mafr/x Form of a Linear System 



Matrix multiplication has an important application to systems of linear equations. Consider a system of m linear 
equations in n unknowns: 



Since two matrices are equal if and only if their corresponding entries are equal, we can replace the m equations in 
this system by the single matrix equation 





+ 




+ • 


• + 








^21X1 


+ 




+ • 


• + 


^2nXn 




h 




+ 




+ • 


• + 









The w X 1 matrix on the left side of this equation can be written as a product to give 



^12 

^21 <^22 



^2n 









X2 




h 






bm 



If we designate these matrices by ^, x, and b, respectively, then we can replace the original system of m equations in 
n unknowns has been replaced by the single matrix equation 

The matrix A in this equation is called the coefficient matrix of the system. The augmented matrix for the system is 
obtained by adjoining b to ^ as the last column; thus the augmented matrix is 





ail 


^12 • 




bl 


[^lb] = 


"321 


022 ' 


■ • a2yi 


b2 










bm 



The vertical bar in [^|b] is a convenient way to 
separate A from b visually; it has no mathematical 
significance. 



Transpose of a Matrix 

We conclude this section by defining two matrix operations that have no analogs in the arithmetic of real numbers, 
r n 

DEFINITION 7 

T 

If A is any mxn matrix, then the transpose of A, denoted by ^ , is defined to be the^xm matrix that results 

T 

by interchanging the rows and columns of A; that is, the first column of A is the first row of A, the second 

T 

column of A is the second row of A, and so forth. 



EXAMPLE 10 Some Transposes M 

The following are some examples of matrices and their transposes. 

C=[l 3 5], D=[4] 









ai2 


^14 




"2 


3' 


A = 


'321 




<^23 


<^2A 


, B = 


1 


4 




«31 


Ct32 


a33 


£3(34 




5 


6 





'221 


■331 


a\2 


<i22 


<332 


a\3 


^23 




ai4 


'324 


^34 



2 1 5 

3 4 6 



£)^=[4] 



Observe that not only are the columns of A the rows of A, but the rows of A are the columns of A. Thus the entry in 

T 

row / and column j of A is the entry in row j and column / of ^; that is, 

[A\={A)^i (11) 

Note the reversal of the subscripts. 

In the special case where ^ is a square matrix, the transpose of A can be obtained by interchanging entries that are 

symmetrically positioned about the main diagonal. In 12 we see that A can also be obtained by "refiecting" A about 
its main diagonal. 



(12) 




f 

Interchange aitnc?i that arc 
symmetrically positioned 
about the mam diai^onal. 

DEFINITION 8 

If ^ is a square matrix, then the trace of A, denoted by tr(^), is defined to be the sum of the entries on the 
main diagonal of A. The trace of A is undefined if A is not a square matrix. 




7 



James Sylvester (1814-1897) 




Arthur Cayley (1821-1895) 



Historical Note The term matrix was first used by the Enghsh mathematician (and lawyer) James Sylvester, 
who defined the term in 1850 to be an "oblong arrangement of terms." Sylvester communicated his work on 
matrices to a fellow English mathematician and lawyer named Arthur Cayley, who then introduced some of 
the basic operations on matrices in a book entitled Memoir on the Theory of Matrices that was published in 
1858. As a matter of interest, Sylvester, who was Jewish, did not get his college degree because he refused to 
sign a required oath to the Church of England. He was appointed to a chair at the University of Virginia in the 
United States but resigned after swatting a student with a stick because he was reading a newspaper in class. 



Sylvester, thinking he had killed the student, fled back to England on the first available ship. Fortunately, the 
student was not dead, just in shock! 
[Images: The Granger Collection, New York] 



EXAMPLE 11 Trace of a Matrix < 



The following are examples of matrices and their traces. 



A 



an an a\z 
^21 ^22 ^23 

^31 "^32 ^33 



-1 

3 
1 



4 



2 
5 
2 
2 



7 
■8 
7 
1 



0 
4 
■3 
0 



tr(^) 



tr(5) = 



-1 + 5 + 7 + 0 = 11 



In the exercises you will have some practice working with the transpose and trace operations. 



Concept Review 

• Matrix 

• Entries 

• Column vector (or column matrix) 

• Row vector (or row matrix) 

• Square matrix 

• Main diagonal 

• Equal matrices 

• Matrix operations: sum, difference, scalar multiplication 

• Linear combination of matrices 

• Product of matrices (matrix multiplication) 

• Partitioned matrices 

• Submatrices 

• Row-column method 

• Column method 

• Row method 

• Coefficient matrix of a linear system 

• Transpose 

• Trace 



Skills 



• Determine the size of a given matrix. 

• Identify the row vectors and column vectors of a given matrix. 

• Perform the arithmetic operations of matrix addition, subtraction, scalar multiplication, and multiplication. 

• Determine whether the product of two given matrices is defined. 

• Compute matrix products using the row-column method, the column method, and the row method. 

• Express the product of a matrix and a column vector as a linear combination of the columns of the matrix. 

• Express a linear system as a matrix equation, and identify the coefficient matrix. 

• Compute the transpose of a matrix. 

• Compute the trace of a square matrix. 



Exercise Set 1.3 

1. Suppose that A,B,C, D, and E are matrices with the following sizes: 

A B C D E 

(4x5) (4x5) (5x2) (4x2) (5x4) 

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size 
the resulting matrix. 



(a) 


BA 


(b) 


AC + D 


(c) 


AB + B 


(d) 


AB + B 


(e) 


E(A + B) 


(f) 


EiAQ 


(g) 


e'^a 


(h) 




Answer: 


(a) 


Undefined 


(b) 


4x2 


(c) 


Undefined 


(d) 


Undefined 


(e) 


5x5 


(f) 


5x2 


(g) 


Undefined 


(h) 


5x2 



2. Suppose that A, B, C, D, and E are matrices with the following sizes: 



A B C D E 

(3x1) (3x6) (6x2) (2x6) (1x3) 

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of 
the resulting matrix. 

(a) EA 

(b) AB'' 

(c) 5^(^ + 5^) 

(d) + C 

(e) (C^ + Z))5^ 

(f) cd + bV 

(g) Ibd^^c^ 

(h) DC + BA 

3. Consider the matrices 

3 0 
A= -1 2 
1 1 





1 


5 


2" 




6 


1 


3" 


D = 


-1 


0 


1 


. 5 = 


-1 


1 


2 




3 


2 


4 




4 


1 


3 



In each part, compute the given expression (where possible). 
(a) D \ E 

(b) 

(c) 5A 

id) -7C 

(e) 2B-C 

(f) 45- 2Z) 

(g) -3(Z)+25) 

(h) A-A 

(i) tr(£>) 

(j) tr(Z)-35) 

(k) 4tr(75) 



(1) tr(^) 








Answer: 








(a) 


7 


6 


5" 






-2 


1 


3 






7 


3 


7 




(b) 


-5 




4 


-1 




0 


-1 


-1 




-1 




1 


1 



f) 



g) 



h) 



i) 
0") 



c) 



d) 



15 0 
-5 10 
5 5 



r _7 _28 -14] 
[-21 -7 -35 J 



e) Undefined 



22 -6 8 

-2 4 6 
10 0 4 

-39 -21 



-24 
-6 -15 



-33 -12 

0 0" 
0 0 
0 0_ 

5 

-25 



-30 



k) 168 

1) Undefined 

Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

a) 2A^+C 

b) d'^-E'^ 

c) ip-E)^ 

d) fi'' + 5C'' 

e) Icr.lA 

f) B-B^ 

g) "iM^-ZD^ 



h) 



i) {CD)E 

0) C{BA) 
k) tr(£)£^) 

1) tr(5Q 

Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

a) AB 

b) BA 

c) (3£)£) 

d) {AB^C 

e) A{BC) 



(g) {DAf 

(h) ic'^By^ 

(i) tr(£>£>^) 

(j) tr(4£^-£)) 

(k) tr(c''^^ + 2£^) 
lr|(fiC^)''^j 



Answer: 



(a) 



12 -3 
-4 5 
4 1 



(b) Undefined 



(c) 


42 


108 


75 




12 


-3 


21 




36 


78 


63 


(d) 


" 3 


45 


9 




11 


-11 


17 




7 


17 


13 


(e) 


" 3 


45 


9 




11 


-11 


17 




7 


17 


13 


















(g) 




-2 


s] 






1 




(h) 


12 


6 


9 




48 


-20 


14 




24 


8 


16 



(i) 61 

0) 35 

(k) 28 

(1) 99 



6. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 

(a) (2Z)^-£)^ 

(b) (45)C+25 



(c) (-iiO^ + 5Z)^ 



(d) 



(e) B^fcC^-A'^A^ 

(f) nV-iSD)^ 



7. Let 



A = 



3 
6 
0 



-2 7 
5 4 
4 9 



and B = 



6 -: 

0 
7 



•2 4 
1 3 
7 5 



Use the row method or column method (as appropriate) to find 

(a) the first row of AB. 

(b) the third row of ^5. 

(c) the second column of AB. 

(d) the first column of BA. 

(e) the third row of AA. 

(f) the third column of AA. 



Answer: 



(a) 


[674141] 


(b) 


[63 67 57] 


(c) 


"41' 






21 






67 




(d) 


' 6" 






6 






63 




(e) 


[24 56 97] 


(f) 


"76" 

98 






97 





8. Referring to the matrices in Exercise 7, use the row method or column method (as appropriate) to find 

(a) the first column of AB. 

(b) the third column of BB. 

(c) the second row of BB. 

(d) the first column of AA. 

(e) the third column of AB. 

(f) the first row of BA. 

9. Referring to the matrices A and B in Exercise 7, and Example 9, 

(a) express each column vectorof AA as a linear combination of the column vectors of A. 

(b) express each column vector of BB as a linear combination of the column vectors of B. 



Answer: 



(aj 


— -f 










.0 




12 






3 




-2 




7 




76 




-? 








7 




4R 


= 3 




1 6 






> 


29 




. 2 


6 


+ 5 


5 


+ 4 


4 




98 


= 7 




+ 4 




1 9 

I 


4 




24 




0 






4 




56 






0 




4 




9 




97 




0 




A 




9 


(b) 


64 




6 




4 




14 






6 




-2 




4 




38 




6 




-2 




4 






21 


= 6 


0 


1 7 


3 


> 


22 




-2 


0 


+ 


1 


+ 7 


3 




18 


= 4 


0 




1 










77 




7 




5 




28 






7 




7 




5 




74 




7 




7 




5 





10. Referring to the matrices A and B in Exercise 7, and Example 9, 

(a) express each column vector of AB as a linear combination of the column vectors of ^. 

(b) express each column vector of BA as a linear combination of the column vectors of B. 



11. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
= b? and write out this matrix equation. 



(a) 2x1-3x2 + 5x3 




7 








9X1 - ^2 + ^3 




-1 










x\ -1-5x2 4 


-4x3 




0 








(b) 


4x1 


•3x3 + 


X4 




1 




5x1 + ^2 






8x4- 




3 




2x1 - 5x2 1 


9x3 




X4 




0 






3X2- 


■ X3 + 7X4: 




2 




Answer: 














(a) 


'2 -3 5" 






7" 








9 -1 1 


^2 






1 








1 5 4 






0 






(b) 


'4 0 -3 


1] 








r 




5 10- 


8 


^2 






3 




2-5 9 - 


1 


^3 






0 




0 3-1' 


7J 


X4 






2 



12. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
Ac = b? aiid write out this matrix equation. 

(a) -2x2 + 3x3= -3 
2x1+ ^2 =0 

-3x2 + 4x3= 1 
^1 + X3 = 5 

(b) 3x1 + 3^:2-1-3x3= -3 
— XI— 5x2 — 2x3= 3 

— 4x2 + X3 = 0 

13. In each part, express the matrix equation as a system of linear equations. 



(a) 


5 


6 


-7" 






'2 




-1 


-2 


3 






0 




0 


4 


-1 


X3 




3 



(b) 


"1 


1 


r 






2" 




2 


3 


0 






2 




5 


-3 


-6 


'3 




-9 



Answer: 

(a) 5X1 + - 7x3 = 2 
-XI - 2x2 + 3x3 = 0 

4x2 — '3 = 3 

(b) XI + X2 + X3 = 2 
2x1 + 3x2 =2 
5x1 ~ 3x2 ~ 6^3 = ~5 

14. In each part, express the matrix equation as a system of Hnear equations. 



(a) 


3 


-1 2" 






2 






4 


3 7 






-1 






-2 


1 5 






4 




(b) 


3 


-2 0 


r 






"o" 




5 


0 2 


-2 


X 




0 




3 


1 4 


7 






0 




-2 


5 1 


6 


z 




0 


In Exercises 15-16, find all values of A:, 


15. 




'1 1 


0] 


k' 






[k 


1 1] 


1 0 


2 


1 


= 0 








0 2 


-3 


1 







Answer: 
-1 



16. 




'1 


2 


0" 


"2" 




i 2 k] 


2 


0 


3 


2 






0 


3 


1 


k 



= 0 



In Exercises 17-18, solve the matrix equation for a, b, c, and d. 

17. fa 3 l^r 4 d-2c'\ 
[_1 fl + ij [d + 2c -2 J 

Answer: 

fl = 4, b= -6, c= -1. rf = l 



18. 



a-b ll 
3rf+c 2<i-cJ [l 6] 



19. Let J be any mxH matrix and let 0 be the « x » matrix each of whose entries is zero. Show that if kA = 0> then 
k = OoTA = 0- 

(a) Show that if AB and BA are both defined, then AB and BA are square matrices. 

(b) Show that if ^ is an ^ x » matrix and A(BA) is defined, then B is mnxm matrix. 



21. Prove: If A and 5 are « x » matrices, then tr(A + 5) = tr(A) + tr(5) . 

(a) Show that if A has a row of zeros and B is any matrix for which AB is defined, then AB also has a row of 
zeros. 

(b) Find a similar result involving a column of zeros. 

23. In each part, find a 5 x 6 matrix [aij] that satisfies the stated condition. Make your answers as general as possible 
by using letters rather than specific numbers for the nonzero entries. 

(a) «y = 0 if i^J 

(b) fly = 0 if i>j 

(c) aij = 0 if i<J 

(d) fly =0 if |j-j^|>l 

Answer: 
(a) 



(b) 



(c) 



(d) 



an 


0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 




an 


-312 






^15 




0 






^24 


^25 


^26 


0 


0 


a33 


<334 


^35 


^36 


0 


0 


0 




rat45 


a4s 


0 


0 


0 


0 






0 


0 


0 


0 


0 


066 


ail 


0 


0 


0 


0 


0 


«21 




0 


0 


0 


0 


031 




033 


0 


0 


0 




1342 


fl4J 


1344 


0 


0 


^51 


as2 


«53 




«55 


0 


061 


1362 




1364 


fl65 


066 


an 


a 12 


0 


0 


0 


0 


021 


fl22 


fl23 


0 


0 


0 


0 






fl34 


0 


0 


0 


0 


fl43 




«45 


0 


0 


0 


0 








0 


0 


0 


0 




<^66 



24. Find the 4x4 matrix A — 

(a) 3y = i + y 

(b) fly = iJ-l 



[fly] whose entries satisfy the stated condition. 



(c) f 1 if |J-J|>1 

-1 if |j-y|<i 



25. Consider the function y = J (x) defined for 2 x 1 matrices xhy y = Ax, where 

1 1 



A = 



0 1 



PlotX-^) together with x in each case below. How would you describe the action of /? 



= (i) 



[0^ 

it 





26. Let / be the ^ x « matrix whose entry in row / and column j is 

|0 if i^j 

Show that Al = IA = A^^^ every nxn matrix A. 

27. How many 3x3 matrices A can you find such that 





x' 




'x 1 y' 


A 


y 




x-y 








0 



for all choices of x, y, and z? 
Answer: 



1 1 0 
1 -1 0 

0 0 0 



One; namely, A = 
28. How many 3x3 matrices A can you find such that 





'x ' 




xy' 


A 


y 




0 




z 




0 



for all choices of x, y, and z? 
29. A matrix B is said to be a square root of a matrix Aif SS = A- 

"2 2" 



fa) 

^ ^ Find two square roots ofA = 



2 2 



5 0 
0 9 



How many different square roots can you find ofA = 
(c) Do you think that every 2x2 matrix has at least one square root? Explain your reasoning 
Answer: 
(a) 
(b) 



1 1 
1 1 



and 



-1 -1 
-1 -1 



'{5 0' 




'-/5 0" 




'{5 0 " 




'-{5 0 " 


. ^ 3. 




0 3 


r 


. 0 -3. 




0 -3_ 



Four; 



30. Let 0 denote a 2 x 2 matrix, each of whose entries is zero. 

(a) Is there a 2 x 2 matrix A such that ^ ^ 0 and AA =0? Justify your answer. 

(b) Is there a 2 x 2 matrix A such that ^4 ^ 0 and AA = A^- Justify your answer. 



True-False Exercises 



In parts (a)-(o) determine whether the statement is true or false, and justify your answer. 



(a) 



The matrix 



1 2 3 
4 5 6 



has no main diagonal. 



Answer: 

True 

(b) An^xn matrix has m column vectors and n row vectors. 
Answer: 

False 

(c) If A and 5 are 2 x 2 rnatrices, then jl£ = £jl. 
Answer: 

False 

(d) The / th row vector of a matrix product AB can be computed by multiplying A by the ith row vector of B. 
Answer: 

False 

For every matrix A, it is true that ^-^^ j = 

Answer: 

True 

(f) If A and B are square matrices of the same order, then tr(AB) = tr(-i4)tr(5) . 
Answer: 

False 

(g) If A and B are square matrices of the same order, then (AB) — A^B'^ . 

Answer: 

False 

(h) Pqj. every square matrix ^, it is true that ^ f-^ ^) ~ ^(-^) • 




Answer: 



True 



(0 If ^ is a $ X 4 matrix and 5 is an ^ x « matrix such that B^A^ is a 2 x 6 matrix, then m=A and « = 2- 



Answer: 



True 

(j) If ^ is an X « matrix and c is a scalar, then tr(cA) = c tr{A) . 
Answer: 

True 

(k) lfA,B, and C are matrices of the same size such that A^C = B — C^ then ^4 = 
Answer: 
True 

(1) IfA^B, and C are square matrices of the same order such that JiC = 5C? then ^ 
Answer: 
False 

(m) If JiB + BA is defined, then A and B are square matrices of the same size. 
Answer: 

True 

(n) If B has a column of zeros, then so does AB if this product is defined. 
Answer: 
True 

(o) If B has a column of zeros, then so does BA if this product is defined. 
Answer: 
False 
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1.4 Inverses; Algebraic Properties of Matrices 

In this section we will discuss some of the algebraic properties of matrix operations. We will see that many of 
the basic rules of arithmetic for real numbers hold for matrices, but we will also see that some do not. 



Properties of Matrix Addition and Scalar IVIultiplication 

The following theorem lists the basic algebraic properties of the matrix operations. 

■ □ 



THEOREM 1 .4.1 Properties of Matrix Arithmetic 

Assuming that the sizes of the matrices are such that the indicated operations can be performed, the 
following rules of matrix arithmetic are valid, 
(^a) A-\- B = B -\~ A (Commutative law for addition) 
(h) A + (B + C) =^ (A + B) + C (Associative law for addition) 
(^c) A(BC) = (AB)C (Associative law for multiplication) 

(d) A(B + C)=AB + AC (Left distributive law) 

(e) iB+C)A = BA + CA (Right distributive law) 
ff) A(B^C)=AB^AC 

(g) {B^C)A = BA^CA 

(h) a(B + C)=aB + aC 

(i) a(B-C)=aB-aC 
(j) (a + b)C = aC + bC 
(k) (a-b)C = aC-bC 
(I) a(bC) = (ab)C 

(m) a(BC) = (aB)C = B(aC) 



To prove any of the equalities in this theorem we must show that the matrix on the left side has the same size 
as that on the right and that the corresponding entries on the two sides are the same. Most of the proofs follow 
the same pattern, so we will prove part (d) as a sample. The proof of the associative law for multiplication is 
more complicated than the rest and is outlined in the exercises. 



There are three basic ways to prove that two 
matrices of the same size are equal — ^prove that 
corresponding entries are the same, prove that 
corresponding row vectors are the same, or 
prove that corresponding column vectors are 
the same. 



Proof (d) We must show that A(B + C) and AB + AC have the same size and that corresponding entries 
are equal. To form A(B -h C) , the matrices B and C must have the same size, say mx?2-> and the matrix A 
must then have m columns, so its size must be of the form ,r x m- This makes ^4(5 + C) an ^ x n matrix. It 
follows that AB | AC is also an ^ x « matrix and, consequently, A(B -\- C) and AB + AC have the same size. 

Suppose that A = [r^fy ] , B = [iyy ] ,and C = [cjj] . We want to show that corresponding entries of 
A(B + C) and AB + AC are equal; that is, 

[A(B + C)hj=[AB + AChj 

for all values of / and j. But from the definitions of matrix addition and matrix multiplication, we have 
[A(B + C) ] ij = an {b ij + c ij ) + ai2 (i>2j + )+•••+ -ati^w (^jm/ + ^ jmj ) 

= («ilil/ + '3i2*2j + • • • +«i>n^m;) + («!l'^lj+<a!i2i^2j • + ' ' ' -^^irrfim}) 

Remark Although the operations of matrix addition and matrix multiplication were defined for pairs of 
matrices, associative laws {b) and (c) enable us to denote sums and products of three matrices as ^ | 5 | C 
and ABC without inserting any parentheses. This is justified by the fact that no matter how parentheses are 
inserted, the associative laws guarantee that the same end result will be obtained. In general, given any sum or 
any product of matrices, pairs of parentheses can be inserted or deleted anywhere within the expression 
without affecting the end result. 



EXAMPLE 1 Associativity of Matrix Multiplication A 



As an illustration of the associative law for matrix multiplication, consider 

1 2" 



Then 



Thus 



and 



A = 



3 4 

0 1 



B = 



A 3 
2 1 



C = 



1 0 

2 3 



AB = 


'1 

3 


2 
4 


'4 3" 
2 1_ 




" 8 
20 


5" 
13 




0 


1 




2 


1 



and BC = 



'A 3" 


'1 0' 




'10 9" 


2 1_ 


2 3_ 









' 8 


5" 


'1 

2 


0" 
3_ 




"18 


15" 


(AB)C = 


20 


13 




46 


39 




2 


1 








4 


3 



A(BC) = 



1 2 
3 4 
0 1 



10 9 
4 3 



18 


15 


46 


39 


4 


3 



SO (AB) C = A(BC) , as guaranteed by Theorem 1 .4. 1 (c). 



Properties of Matrix l\^ulti plication 



Do not let Theorem 1.4.1 lull you into believing that all laws of real arithmetic carry over to matrix 
arithmetic. For example, you know that in real arithmetic it is always true that ab = ba, which is called the 
commutative law for multiplication. In matrix arithmetic, however, the equality of AB and BA can fail for 
three possible reasons: 

1. AB may be defined and BA may not (for example, if ^ is 2 x 3 and 5 is 3 x 4)- 

2. AB and BA may both be defined, but they may have different sizes (for example, if is 2 x 3 ^ind B is 
3x2)- 

3. AB and BA may both be defined and have the same size, but the two matrices may be different (as 
illustrated in the next example). 

Do not read too much into Example 2 — it does 
not rule out the possibility that AB and BA may 
be equal in certain cases, just that they are not 
equal in all cases. If it so happens that 
= SJl, then we say that AB and BA 
commute. 



EXAMPLE 2 Order Matters in Matrix Multiplication M 



Consider the matrices 



Multiplying gives 



Thus, AB^BA. 



A = 



AB = 



-1 0 

2 3 

-1 -2' 

11 4 



and B = 



1 2 

3 0 



and BA = 



3 6 
-3 0 



Zero Matrices 



A matrix whose entries are all zero is called a zero matrix. Some examples are 

"o" 



0 0 
0 0 



0 0 0 
0 0 0 
0 0 0 



0 0 0 0 
0 0 0 0 



[0] 



We will denote a zero matrix by 0 unless it is important to specify its size, in which case we will denote the 
mxn zero matrix by O^xm- 



It should be evident that if A and 0 are matrices with the same size, then 

A+0=0+A=A 

Thus, 0 play s the same role in this matrix equation that the number 0 plays in the numerical equation 
a + 0 = 0-\-a = a- 

The following theorem lists the basic properties of zero matrices. Since the results should be self-evident, we 
will omit the formal proofs. 



THEOREM 1 .4.2 Properties of Zero Matrices 

If c is a scalar, and if the sizes of the matrices are such that the operations can be perfomed, then: 
(a) A^O = O^A = A 
(t) A-0 = A 

(c) A^A = A^{^A) = 0 

(d) 0A = 0 

(e) Ifc^=athenc = Oor^ = 0. 



Since we know that the commutative law of real arithmetic is not valid in matrix arithmetic, it should not be 
surprising that there are other rules that fail as well. For example, consider the following two laws of real 
arithmetic: 

• If = be and ?t 0, then b = C' [The cancellation law] 

• If i:^^ = 0? then at least one of the factors on the left is 0. 

The next two examples show that these laws are not universally true in matrix arithmetic. 

EXAMPLE 3 Failure of the Cancellation Law < 



Consider the matrices 



A = 



We leave it for you to confirm that 



"0 r 


, B = 


"1 r 




'2 5' 


. c= 


0 2 




3 4 




3 4 



AB = AC = 



Although A^ 0-> canceling A from both sides of the equation AB = AC would lead to the 
incorrect conclusion that B = C- Thus, the cancellation law does not hold, in general, for matrix 
multiplication. 



EXAMPLE 4 AZero Product with Nonzero Factors M 



Here are two matrices for which jl£ = Q, but A^O ^nd B^O- 

■3 7 
0 0 



-[0 i - 



Identity Matrices 



A square matrix with I's on the main diagonal and zeros elsewhere is called an identity matrix. Some 
examples are 



[1]. 



1 0 
0 1 



1 0 0 
0 1 0 
0 0 1 



1000 
0100 
0010 
0001 



An identity matrix is denoted by the letter /. If it is important to emphasize the size, we will write I„ for the 
K X « identity matrix. 



To explain the role of identity matrices in matrix arithmetic, let us consider the effect of multiplying a general 
2x3 matrix A on each side by an identity matrix. Multiplying on the right by the 3 x 3 identity matrix yields 

1 0 0" 
0 1 0 
0 0 1 



r T 1 0 0 r -i 

3 ^21 a22 <^22 ^ I ^ ^21 *322 ^123 
L J n n 1 L -I 



and multiplying on the left by the 2x2 identity matrix yields 



l2A = 



1 0 
0 1 



^11 ^12 ^13 
^^21 ^^22 ^23 



^11 ^12 <^\3 
^21 <^22 ^22 



The same result holds in general; that is, if A is any x n matrix, then 

Aly^ = A and /^^ = ^ 

Thus, the identity matrices play the same role in these matrix equations that the number 1 plays in the 
numerical equation a - \ = \ - a^a- 



As the next theorem shows, identity matrices arise naturally in studying reduced row echelon forms of square 
matrices. 



THEOREM 1A3 

If R is the reduced row echelon form of an ^ x ^2 matrix A, then either R has a row of zeros or R is the 
identity matrix In. 



□ 



Proof Suppose that the reduced row echelon form of A is 





>ll 


ri2 • 




R = 




'"22 • 






/«1 


rn2 • 


• ' r„„ 



Either the last row in this matrix consists entirely of zeros or it does not. If not, the matrix contains no zero 
rows, and consequently each of the n rows has a leading entry of 1. Since these leading I's occur 
progressively farther to the right as we move down the matrix, each of these I's must occur on the main 
diagonal. Since the other entries in the same column as one of these I's are zero, R must be Thus, either R 
has a row of zeros or /J = /„. 



Inverse of a Matrix 

In real arithmetic every nonzero number a has a reciprocal cat ~^ = \ f a) with the property 

The number is sometimes called the multiplicative inverse of a. Our next objective is to develop an 
analog of this result for matrix arithmetic. For this purpose we make the following definition. 



DEFINITION 1 

If ^ is a square matrix, and if a matrix B of the same size can be found such that AB = BA = L then A 
is said to be invertible (or nonsingular) and B is called an inverse of ^. If no such matrix B can be 
found, then A is said to be singular. 



Remark The relationship = BA = / is not changed by interchanging A and 5, so if A is invertible and B 
is an inverse of A, then it is also true that B is invertible, and A is an inverse of B. Thus, when 



A£ = BA = I 



we say that A and B are inverses of one another. 



EXAMPLES An Invertible Matrix < 



Let 



A = 



2 -5 
■1 3 



and 5 = 



3 5 
1 2 



Then 



AB = 
BA = 



2 • 


-5' 


'3 5" 




'1 0" 


-1 


3 


1 2 




0 1 


"3 5" 


2 -5' 




"1 0' 


1 2 


-1 3_ 




0 1 



Thus, A and B are invertible and each is an inverse of the other. 



EXAMPLE 6 Class of Singular Matrices M 



In general, a square matrix with a row or column of zeros is singular. To help understand why 
this is so, consider the matrix 



A = 



1 4 0 

2 5 0 

3 6 0 



To prove that A is singular we must show that there is no 3 x 3 matrix B such that AB = BA = / 
. For this purpose let c i , C2, 0 be the column vectors of A. Thus, for any 3x3 matrix B we 
can express the product BA as 

BA = B[ci C2 0] = [5ci Bc2 0] [Formula (6) of Section 1.3] 

The column of zeros shows that BA ^ I ^nd hence that A is singular. 



Properties of Inverses 



It is reasonable to ask whether an invertible matrix can have more than one inverse. The next theorem shows 
that the answer is no — an invertible matrix has exactly one inverse. 



THEOREM 1A4 



If B and C are both inverses of the matrix A, then B = C- 



Proof Since B is an inverse of ^, we have BA = /• Multiplying both sides on the right by C gives 
(BA)C = IC = C. But it is also true that (BA)C = B{AC) =Bl = B, so C = B- 



As a consequence of this important result, we can now speak of "the" inverse of an invertible matrix. If A is 
invertible, then its inverse will be denoted by the symbol j[ ~^ . Thus, 



A4"^=/ and A''^A = I 



(1) 



The inverse of A plays much the same role in matrix arithmetic that the reciprocal ^ ^ plays in the numerical 
relationships aa~^ = \ ^^^a^^a = 1- 

In the next section we will develop a method for computing the inverse of an invertible matrix of any size. 
For now we give the following theorem that specifies conditions under which a 2 x 2 matrix is invertible and 
provides a simple formula for its inverse. 



THEOREM 1A5 

The matrix 

A = 

is invertible if and only i{ ad — be ^ 0^ which case the inverse is given by the formula 

(2) 



ch case the i 
ad — be [— c a\ 



n 



We will omit the proof, because we will study a more general version of this theorem later. For now, you 
should at least confirm the validity of Formula 2 by showing that = A = l- 



Historical Note The formula for j[ ^ given in Theorem 1 .4.5 first appeared (in a more general 
form) in Arthur Cayley's 1858 Memoir on the Theory of Matrices. The more general result that 
Cayley discovered will be studied later. 



The quantity ad — be in Theorem 1.4.5 is 
called the determinant of the 2x2 matrix A 
and is denoted by 

det(^) =ad-be 

or alternatively by 

b 



= ad — be 



Remark Figure 1 .4. 1 illustrates that the determinant of a 2 x 2 matrix A is the product of the entries on its 
main diagonal minus the product of the entries off its main diagonal. In words. Theorem 1.4.5 states that a 
2x2 matrix A is invertible if and only if its determinant is nonzero, and if invertible, then its inverse can be 
obtained by interchanging its diagonal entries, reversing the signs of its off-diagonal entries, and multiplying 
the entries by the reciprocal of the determinant of ^. 



del(A)= %r; =ad-bc 



Figure 1.4.1 



EXAMPLE 7 Calculating the Inverse of a 2 X 2 Matrix < 



In each part, determine whether the matrix is invertible. If so, find its inverse. 

'6 r 

5 2 

-1 2 
3 -6 



Solution 

(a) The determinant of A is det(j4) = (6) (2) — (1) (5) = 7, which is nonzero. Thus, A is 
invertible, and its inverse is 

r 2 _i 

2 -1]^ 7 7 

-5 s\ _5 6 

7 7 



^ -7 



We leave it for you to confirm that ^ = A~^A = I- 
(b) The matrix is not invertible since det(-;4) = ( — 1) (—6) — (2) (3) = 0. 



EXAMPLE 8 Solution of a Linear System by Matrix Inversion M 



A problem that arises in many applications is to solve a pair of equations of the form 

u=ax + by 
V = cx + dy 

for X and 3; in terms of u and v. One approach is to treat this as a linear system of two equations in the 
unknowns x and and use Gauss- Jordan elimination to solve for x and 3;. However, because the 
coefficients of the unknowns are literal rather than numerical, this procedure is a little clumsy. As an 
alternative approach, let us replace the two equations by the single matrix equation 

ax ^by 

cx-^dy 



which we can rewrite as 



[v]=[: tA 



If we assume that the 2x2 matrix is invertible (i.e., ad '—be ^ 0)? then we can multiply through on 
the left by the inverse and rewrite the equation as 



a h 
c d 



1-1 



[v]=[: 31: % 



which simplifies to 



Using Theorem 1.4.5, we can rewrite this equation as 

ad ^hc 



from which we obtain 



^ _ du — hv y^ .^^^^^ 



ad — be 



ad — be 



The next theorem is concerned with inverses of matrix products. 

m 

THEOREM 1A6 

If A and B are invertible matrices with the same size, then AB is invertible and 
Proof We can establish the invertibility and obtain the stated formula at the same time by showing that 

(ab)Ib-^a-^^=Ib-^a'^Yab)=i 

But 

(AB) (B~^A~^^ = AlBB~^y~^ =A!A~^=AA'~^ =1 
and similarly, (b ^A ^ J(A5) = /. 

Although we will not prove it, this result can be extended to three or more factors: 



A product of any number ofinvertible matrices is invertible, and the inverse of the product is the 
product of the inverses in the reverse order. 



EXAMPLE 9 The Inverse of a Product A 



Consider the matrices 



We leave it for you to show that 

AB-- 



and also that 



3 -2 

-1 1 



A = 



1 2 

1 3 



7 6 
9 8 



1 -1 



3 2 

2 2 



4 -3 

9 7 



Thus, (A£) ^ = B ^ as guaranteed by Theorem 1 .4.6. 



1 -1 

3 
2 



-1 4 



3 -2 
-1 1 



4 -3 

i 2 
'2 2 



Powers of a Matrix 

If ^ is a square matrix, then we define the nonnegative integer powers of A to be 

A^ = I and A" = AA- ■ - A [« factors] 

and if A is invertible, then we define the negative integer powers of A to be 

A'"" = I^A'^y = A~K4-^ • • -A-^ [^factors] 

Because these definitions parallel those for real numbers, the usual laws of nonnegative exponents hold; for 
example, 

A''A^=A''-^' and (^'■)' = ^" 

If a product of matrices is singular, then at least 
one of the factors must be singular. Why? 

In addition, we have the following properties of negative exponents. 

a 

THEOREM 1.4.7 

If A is invertible and « is a nonnegative integer, then: 
is invertible and ^ =A. 



(b) A" is invertible and (^") ^ = ^"^ = (j-^ 

(c) kA is invertible for any nonzero scalar k, and (kA) ~^ = k~^A~^ . 



We will prove part (c) and leave the proofs of parts (a) and (b) as exercises. 
Proof (c) Properties (c) and (m) in Theorem 1 .4. 1 imply that 

(kA) lk-^A-^^=k-^ ikA)A = (k '^k^A = ( 1 )/ = / 
and similarly, lk~^A~^'^ = (kA) = I. Thus, kA is invertible and (kA) ~^ = k'^A'^ 



EXAMPLE 10 Properties of Exponents M 

Let A and ^4 ~^ be the matrices in Example 9; that is. 



1 2 
1 3 



and A~^ = 



3 -2 
-1 1 



Then 



Also, 



-2' 


3 -2' 


3 -2" 




41 


-30" 


1 


-1 1 


-1 1 




-15 


11 





'1 2 




"11 30" 


ii; ^1 


1 3_ 




_15 41_ 



SO, as expected from Theorem 1.4.7(&), 



1 



(11)(41)-(30)(15) 



41 


-30" 




41 


-30" 


_-15 


11_ 




-15 


11_ 



EXAMPLE 11 The Square of a Matrix Sum M 

In real arithmetic, where we have a commutative law for multiplication, we can write 
{a + b)'^ = a^ +ab + ba + b'^ = a^ ab + ab + b'^ = a^ + 2ab + b'^ 

However, in matrix arithmetic, where we have no commutative law for multiplication, the best 
we can do is to write 

(A + Bf = A^ + AB + BA + B^ 

It is only in the special case where A and B commute (i.e., J{B = BA) that we can go a step 
further and write 



Matrix Polynomials 



If ^ is a square matrix, say n xn, and if 

is any polynomial, then we define the « x « matrix p{A) to be 



m 



p(A) =ao! + aiA + a2A^ + 



+ arr,A"' 



(3) 



where / is the ^2 v ,>2 identity matrix; that is, p(A) is obtained by substituting A for x and replacing the constant 
term ^0 by the matrix a^I. An expression of form 3 is called a matrix polynomial in A. 

EXAMPLE 12 A Matrix Polynomial M 

Find ly (A) for 



Solution 



p(x) = 




-3 


and A = 




1 21 
0 3j 


p(A) = . 


A^-2A 


-31 










"-1 2" 
0 3 


-2 


■-1 2" 

0 3 


-3 


'1 0" 

0 1 



1 4 




-2 


4 




3 


0 




0 


0 


0 9 




0 


6 




0 


3 




0 


0 



or more briefly, p(A) = 0. 



Remark It follows from the fact that = A^'^^ = A^'^^ = A^A^ that powers of a square matrix 
commute, and since a matrix polynomial in A is built up from powers of A, any two matrix polynomials in A 
also commute; that is, for any polynomials p\ and p2 we have 



pdA)p2(A)=p2(A)pi(A) 



(4) 



Properties of the Transpose 



The following theorem lists the main properties of the transpose. 



THEOREM 1A8 

If the sizes of the matrices are such that the stated operations can be performed, then: 

(b) (^ + 5)^ = ^^ + 5^ 

(c) {A^B)'^ = A^ ^b'^ 

(d) (kA)'^ = kA^ 

(e) (AB)^ = B^A^ 

If you keep in mind that transposing a matrix interchanges its rows and columns, then you should have little 
trouble visualizing the results in parts (a)-(d). For example, part (a) states the obvious fact that interchanging 
rows and columns twice leaves a matrix unchanged; and part (b) states that adding two matrices and then 
interchanging the rows and columns produces the same result as interchanging the rows and columns before 
adding. We will omit the formal proofs. Part (e) is a less obvious, but for brevity we will omit its proof as 
well. The result in that part can be extended to three or more factors and restated as: 

The transpose of a product of any number of matrices is the product of the transposes in the reverse 
order 



The following theorem establishes a relationship between the inverse of a matrix and the inverse of its 
transpose. 

s 

THEOREM 1A9 

T 

If A is an invertible matrix, then A is also invertible and 

(.y=(.-f 



Proof We can establish the invertibility and obtain the formula at the same time by showing that 
But from part {e) of Theorem 1 .4.8 and the fact that / ^ = /, we have 



(^-1)V = [AA-'f = !'' = I 



which completes the proof. 



EXAMPLE 13 Inverse of a Transpose M 

Consider a general 2x2 invertible matrix and its transpose: 



A = 


'a b' 


and A^ = 




c d 





b d\ 



T 

Since A is invertible, its determinant ad — be is nonzero. But the determinant of A is also 

T 

ad — be (verify), so A is also invertible. It follows from Theorem 1 .4.5 that 

d c 



ad — be 
b 



ad -be 

a 



which is the same matrix that results if ^ Ms transposed (verify). Thus 



ad ---be ad ^ be 



as guaranteed by Theorem 1.4.9. 



Concept Review 

• Commutative law for matrix addition 
« Associative law for matrix addition 

« Associative law for matrix multiplication 

• Left and right distributive laws 

• Zero matrix 

• Identity matrix 

« Inverse of a matrix 

« Invertible matrix 

• Nonsingular matrix 
« Singular matrix 

• Determinant 

• Power of a matrix 



« Matrix polynomial 

Skills 

* Know the arithmetic properties of matrix operations. 

* Be able to prove arithmetic properties of matrices. 

* Know the properties of zero matrices. 

* Know the properties of identity matrices. 

* Be able to recognize when two square matrices are inverses of each other. 
< Be able to determine whether a 2 x 2 matrix is invertible. 

« Be able to solve a linear system of two equations in two unknowns whose coefficient matrix 
invertible. 

* Be able to prove basic properties involving invertible matrices. 

« Know the properties of the matrix transpose and its relationship with invertible matrices. 



Exercise Set 1 A 

1. Let 





2 


-1 


3' 




"8 


-3 


-5" 




"0 


-2 


3' 


A = 


0 


4 


5 


. B = 


0 


1 


2 


, c= 


1 


7 


4 




-2 


1 


4 




4 


-7 


6 




3 


5 


9 



Show that 

(a) A+(B + C) = (A + B) + C 

(b) iAB)C = A(BC) 

(c) (a + b)C = aC + bC 

(d) a(B-C) =aB-aC 

2. Using the matrices and scalars in Exercise 1 , verify that 

(a) a(BC) = iaB)C = B(aC) 

(b) A(B-C)=AB-AC 

(c) (B + C)A = BA + CA 

(d) a(bC) = (ab)C 

3. Using the matrices and scalars in Exercise 1, verify that 

(b) (^ + 5)^ = ^^ + 5^ 

(c) {aC)'^ = aC'^ 

(d) {AB)'^ = B'^A^ 



In Exercises 4-7 use Theorem 1.4.5 to compute the inverses of the following matrices. 



B 



Answer: 



i J. 

5 20 

_i -L 

5 10 



Answer: 



8. Find the inverse of 



9. Find the inverse of 



Answer: 



[cos 9 sinf/l 
^sin 0 cos £P J 



Use the matrix A in Exercise 4 to verify that ^-4^ J = ^j4~^ J • 
Use the matrix 5 in Exercise 5 to verify that {b = J . 
12. Use the matrices A and 5 in 4 and 5 to verify that {Aff) = 5~^j4~^ . 



13. Use the matrices A, B, and C in Exercises 4-6 to verify that {ABC) —C B A 
In Exercises 14-17, use the given information to findJ. 



13 13 

13 ~13 

18. Let^ be the matrix 



In each part, compute the given quantity. 



(a) 

(b) A-'^ 

(c) A^-2A^I 

(d) p {A) , where p{x) = x — 2 

(e) p{A), where p{x) = - x + 1 

(f) p{A) , where = - 2x + 4 



19. Repeat Exercise 18 for the matrix 




Answer: 




Answer: 




Answer: 



(a) [41 15] 

30 llj 

(b) f 11 -151 
[-30 4lJ 

[I 3 

(f) [39 13] 
[26 13j 



20. Repeat Exercise 18 for the matrix 



A = 



21. Repeat Exercise 18 for the matrix 



A = 



'3 
0 
5 



'3 
0 
0 



0 -1 
-2 0 

0 2 



0 0 
-1 3 
-3 -1 



Answer: 



(a) 



(b) 



(c) 



(d) 



(e) 



27 0 0 
0 26 -18 
0 18 26 

if " " 

0 0.026 0.018 
0 -0.018 0.026 

4 0 0 
0 -5 -12 

0 12 -5 

1 0 0 
0-3 3 
0 -3 -3 

16 0 0 
0 -14 -15 
0 15 -14 



(f) 



25 0 0 
0 32 -24 
0 24 32 



In Exercises 22-24, let pi(x) =x^ — 9, P2(x) = x + 3» and P3(x) =X — 3. Show that 
(j4) = P2(^)P3(-A) for the given matrix. 

22. The matrix^ in Exercise 18. 

23. The matrix^ in Exercise 21. 

24. An arbitrary square matrix A. 

25. Show that ifp{x) =x^-(a + d)x + (ad- be) and 



then p(A) = 0. 

26. Show that if ^(x) = x — (a + 6 + c)x + (ab -^ae + be— cd)x — a(be — erf) and 

fa 0 0" 



A = 



0 b c 
Ode 



then p(A) = 0. 
27. Consider the matrix 



i4 = 



ail 0 
0 fl22 

0 0 



0 
0 

: 



where taf 11^22 
Answer: 

^ 0 

■ ■ 

: ! 

0 0 



• ^ 0- Show that A is invertible and find its inverse. 



0 
0 

1 



a 



28. Show that if a square matrix A satisfies ^3A + I = 0^ then i4""^ = 3/ — ^. 

(a) Show that a matrix with a row of zeros cannot have an inverse. 

(b) Show that a matrix with a column of zeros cannot have an inverse. 

30. Assuming that all matrices are « x ?2 ^nd invertible, solve for D. 

abc^dba'^c=ab^ 



31. Assuming that all matrices are « x « and invertible, solve for D. 



Answer: 



D = CA-'^B-'^A-'^BC'^ 



^ ' If y4 is a square matrix and « is a positive integer, is it true that ) ~<A ) ? Justify your answer. 
33. Simplify: 



Answer: 



-1 



B 

34. Simplify: 



[aC-')~\aC-')[aC-')~' AD-' 



In Exercises 35-37, determine whether A is invertible, and if so, find the inverse. [Hint: Solve = / for X 
by equating corresponding entries on the two sides.] 



35. 



A = 



1 0 1 
1 1 0 
0 1 1 



Answer: 



A-' = 



36. 



A = 



37. 



A = 



1 
2 

1 

2 

i 

2 

1 1 1 
1 0 0 
0 1 1 

0 0 1 

1 1 0 

-1 1 1 



i 

2 

i 

2 

1 
"2 



Answer: 



1 


1 


1 


2 


2 


2 


1 


1 


1 


2 


2 


2 


1 


0 


0 



38. Prove Theorem 1.4.2. 

In Exercises 39^2, use the method of Example 8 to find the unique solution of the given linear system. 



39. 3x1 -2^:2 = -1 
4:^1 + 5:^2= 3 



Answer: 
1 



13 



^1-23' ^2-23 

40. -:ri + 53:2 = 4 
—XI — 3x2 = 1 

41. 6x1 + X2 = 0 
4x1-3x2= -2 



Answer: 



1 



42.2x1-2x2 = 4 
XI +4x2 = 4 

43. Prove part (a) of Theorem 1.4.1. 

44. Prove part (c) of Theorem 1.4.1. 

45. Prove part (f) of Theorem 1.4.1. 

46. Prove part (b) of Theorem 1.4.2. 

47. Prove part (c) of Theorem 1.4.2. 

48. Verify Formula 4 in the text by a direct calculation. 

49. Prove part (d) of Theorem 1.4.8. 

50. Prove part (e) of Theorem 1.4.8. 

(a) Show that if A is invertible and AB — AC^ then B=C' 

(b) Explain why part (a) and Example 3 do not contradict one another. 

52. Show that if A is invertible and k is any nonzero scalar, then ikA) " = k^J^ for all integer values of n. 

(a) Show that if A, B, and ^4 + -5 are invertible matrices with the same size, then 

(b) What does the result in part (a) tell you about the matrix i4"^ + 5~^? 



54. A square matrix A is said to be idempotent if ^ = ^. 

(a) Show that if A is idempotent, then so is / — A- 

(b) Show that if A is idempotent, then 2-^4 — / is invertible and is its own inverse. 

55. Show that if ^ is a square matrix such that = 0 for some positive integer k, then the matrix A is 
invertible and 

(1-^)"^=/ + ^ + ^^+ • • • +^^"^ 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Two n^n matrices, A and B, are inverses of one another if and only if = BA = 0- 
Answer: 

False 

(b) For all square matrices A and B of the same size, it is true that {A + B) =A 2AB + B . 

Answer: 

False 

9 9 

(c) For all square matrices A and B of the same size, it is true that i4 — 5 = (-4 — B) (-^4 + 5) . 

Answer: 

False 

(d) If A and B are invertible matrices of the same size, then AB is invertible and {AB) ~^ =A "^B . 

Answer: 

False 

(e) If A and B are matrices such that AB is defined, then it is true that (AB) = A^B^. 

Answer: 

False 

(f) The matrix 

is invertible if and only if ad — be ^0- 

Answer: 

True 



(g) If A and B are matrices of the same size and A: is a constant, then (kA + 5) = kA 

Answer: 

True 

(h) If A is an invertible matrix, then so is Ji^, 
Answer: 

True 

0)^fp(x)=ao+aix + a2X^+ • • ■ and /is an identity matrix, then 

Answer: 

False 

(j) A square matrix containing a row or column of zeros cannot be invertible. 
Answer: 
True 

(k) The sum of two invertible matrices of the same size must be invertible. 
Answer: 
False 
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1.5 Elementary Matrices and a Method for Finding 

In this section we will develop an algorithm for finding the inverse of a matrix, and we will discuss some of the 
basic properties of invertible matrices. 

In Section 1.1 we defined three elementary row operations on a matrix ^: 

1. Multiply a row by a nonzero constant c. 

2. Interchange two rows. 

3. Add a constant c times one row to another. 

It should be evident that if we let B be the matrix that results from A by performing one of the operations in this 
list, then the matrix A can be recovered from B by performing the corresponding operation in the following list: 

1. Multiply the same row by l/c. 

2. Interchange the same two rows. 

3. If B resulted by adding c times row ri of A to row r2, then add -c times ri to r2. 

It follows that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A (Exercise 43). Accordingly, 
we make the following definition. 

r n 
DEFINITION 1 

Matrices A and B are said to be row equivalent if either (hence each) can be obtained from the other by 
a sequence of elementary row operations. 



Our next goal is to show how matrix multiplication can be used to carry out an elementary row operation. 



r 



DEFINITION 2 

An ^xn matrix is called an elementary matrix if it can be obtained from the ^ x « identity matrix 
by performing a single elementary row operation. 



EXAMPLE 1 Elementary Matrices and Row Operations A 



Listed below are four elementary matrices and the operations that produce them. 



[:-;] 

T 

Multiply the 

second row of 
/2by-3. 



T 



Interchange the 

second and fourth 
rows of /4 . 



"1 


0 


3" 




"1 


0 


0' 


0 


1 


0 




0 


1 


0 


0 


0 


1 




0 


0 


1 



T 

Add 3 times the 

third row of 

I2 to the first row. 



T 

Multiply the 

first row of 
/3 by 1 . 



The following theorem, whose proof is left as an exercises, shows that when a matrix A is multiplied on the 
by an elementary matrix E, the effect is to perform an elementary row operation on ^. 

□ 



THEOREM 1.5.1 Row Operations by Matrix Multiplication 

If the elementary matrix E results from performing a certain row operation on and if ^ is an ^ x « 
matrix, then the product EA is the matrix that results when this same row operation is performed on^. 



EXAMPLE 2 Using Elementary Matrices M 

Consider the matrix 



and consider the elementary matrix 





'\ 




0 


2 


3 


A = 


2 




-1 


3 


6 




1 




4 


4 


0 






"1 


0 


0" 




E 




0 


1 


0 








3 


0 


1 





which results from adding 3 times the first row of 73 to the third row. The product EA is 





'\ 


0 


2 


3 


EA = 


2 


-1 


3 


6 




4 


4 


10 


9 



which is precisely the same matrix that results when we add 3 times the first row of A to the third 
row. 

Theorem 1.5.1 will be a useful tool for 
developing new results about matrices, 
but as a practical matter it is usually 
preferable to perform row operations 
directly. 



We know from the discussion at the beginning of this section that if E is an elementary matrix that results from 
performing an elementary row operation on an identity matrix /, then there is a second elementary row 
operation, which when applied to E, produces / back again. Table 1 lists these operations. The operations on the 
right side of the table are called the inverse operations of the corresponding operations on the left. 



Table 1 



Row Operation on / That Produces E 


Row Operation on E That Reproduces / 


Multiply row / by c :?t 0 


Multiply row / by lie 


Interchange rows / and j 


Interchange rows / and j 


Add c times row / to row j 


Add -c times row / to row j 



EXAMPLE 3 Row Operations and Inverse Row Operations A 



In each of the following, an elementary row operation is applied to the 2x2 identity matrix to 
obtain an elementary matrix E, then E is restored to the identity matrix by applying the inverse row 
operation. 



T 

Multiply the second 
row by 7. 



T 

Interchange the first 
and second rows. 



T 

Add 5 times the 
second row to the 
first. 



T 

Multiply the second 
row by y . 



T 

Interchange the first 
and second rows. 



Add —5 times the 
second row to the 
first. 



The next theorem is a key result about invertibility of elementary matrices. It will be a building block for many 
results that follow. 



THEOREM 1.5.2 



Every elementary matrix is invertible, and the inverse is also an elementary matrix. 



Proof If E is an elementary matrix, then E results by performing some row operation on /. Let Eq be the 
matrix that results when the inverse of this operation is performed on /. Applying Theorem 1.5.1 and using the 
fact that inverse row operations cancel the effect of each other, it follows that 

Thus, the elementary matrix Eq is the inverse of E. 

Equivalence Theorem 

One of our objectives as we progress through this text is to show how seemingly diverse ideas in linear algebra 
are related. The following theorem, which relates results we have obtained about invertibility of matrices, 
homogeneous linear systems, reduced row echelon forms, and elementary matrices, is our first step in that 
direction. As we study new topics, more statements will be added to this theorem. 

THEOREM 1 .5.3 Equivalent Statements 

If ^ is an ^ X « rnatrix, then the following statements are equivalent, that is, all true or all false. 

(a) A is invertible. 

(b) Ax. = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is /„. 

(d) A is expressible as a product of elementary matrices. 




This makes it evident visually that the validity 



of any one statement implies the validity of all 
the others, and hence that the falsity of any one 
implies the falsity of the others. 



Proof We will prove the equivalence by establishing the chain of implications: 



(^o) = ^ ~^0. or [-^ JxQ = 0, or Ixq = 0, or xq = 0. Thus, ^is = 0 has only the 



(a) => (b) ^ (c) =^ (d) ^ (a) 

(a) ^ (b) Assume A is invertible and let xq be any solution of. Multiplying both sides of this equation by the 
matrix J[ ~^ gives j[ 

trivial solution. 

(i?) ^ (c) Let ^ = 0 be the matrix form of the system 

flll^i +ai2X2 + — + <atlM^M = 0 
021^1 +a22J:2 + - + «32MJ:M = 0 



(1) 



and assume that the system has only the trivial solution. If we solve by Gauss-Jordan elimination, then the 
system of equations corresponding to the reduced row echelon form of the augmented matrix will be 



^1 



^2 



= 0 

= 0 

, = 0 



(2) 



Thus the augmented matrix 



^11 
^21 



^22 



for 1 can be reduced to the augmented matrix 



0 0 0 



0 0 

0 0 

0 0 

1 0 



for 2 by a sequence of elementary row operations. If we disregard the last column (all zeros) in each of these 
matrices, we can conclude that the reduced row echelon form of A is /^.j. 

(c) Assume that the reduced row echelon form of ^ is so that A can be reduced to /^.j by a finite 

sequence of elementary row operations. By Theorem 1.5.1, each of these operations can be accomplished by 
multiplying on the left by an appropriate elementary matrix. Thus we can find elementary matrices 
E\, E2, such that 



(3) 



By Theorem 1.5.2, Ei, E2, Eif, are invertible. Multiplying both sides of Equation 3 on the left successively 
by Ej^K.. ., E2^ , £f ^ we obtain 

A = 5f 152"^ • • • = 5f 152"^ • • • (4) 

By Theorem 1.5.2, this equation expresses ^ as a product of elementary matrices. 

(^d) ^ (a) If ^ is a product of elementary matrices, then from Theorem 1 .4.7 and Theorem 1 .5.2, the matrix A 
is a product of invertible matrices and hence is invertible. 



A Method for Inverting Matrices 

As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that can be used to tell 
whether a given matrix is invertible, and if so, produce its inverse. To derive this algorithm, assume for the 
moment, that A is an invertible ,^ x ^2 rnatrix. In Equation 3, the elementary matrices execute a sequence of row 
operations that reduce A to I^y If we multiply both sides of this equation on the right by A ~^ ^nd simplify, we 
obtain 

A-^^Ek- ■ •E2EiI„ 

But this equation tells us that the same sequence of row operations that reduces A to /^.j will transform /^.j to A 
. Thus, we have established the following result. 



Inversion Algorithm 



To find the inverse of an invertible matrix A, find a sequence of elementary row operations that reduces 
A to the identity and then perform that same sequence of operations on to obtain j[ ~^ . 



n 



A simple method for carrying out this procedure is given in the following example. 

EXAMPLE 4 Using Row Operations to Find ^ 

Find the inverse of 



1 2 3 

2 5 3 
1 0 8 



Solution We want to reduce A to the identity matrix by row operations and simultaneously 
apply these operations to / to produce ^ ~^ . To accomplish this we will adjoin the identity matrix 
to the right side of A, thereby producing a partitioned matrix of the form 

Then we will apply row operations to this matrix until the left side is reduced to /; these 
operations will convert the right side to ^ ~^ , so the final matrix will have the form 



The computations are as follows: 



Thus, 





'l 


2 3 


1 


0 


o' 






2 


5 3 


0 


1 


0 






1 


0 8 


0 


0 


1 




1 


2 


3 


1 


0 


o" 




0 


1 


-3 


-2 


1 


0 




0 - 


2 


5 


-1 


0 


1 




'l 


2 


3 


1 


0 


o" 




0 


1 


-3 


-2 


1 


0 




0 


0 


-1 


-5 


2 


1 





1 


2 


3 


1 


0 


0 




0 


1 


-3 


-2 


1 


0 




0 


0 


1 


5 


-2 


-1 




"l 


2 


0 


-14 


6 


3' 




0 


1 


0 


13 


-5 


-3 




0 


0 


1 


5 


-2 


-1 




'l 


0 


0 


-40 


16 


9' 




0 


1 


0 


13 


-5 


-3 




0 


0 


1 


5 


-2 


-1 





We added —2 times the first 
row to the second and —1 times 
the first row to the third. 

We added 2 times the 
second row to the third. 

We multiplied the third 
row by— 1. 

We added 3 times the third 
row to the second and —3 times 
the third row to the first. 

We added —2 times the 
second row to the first. 



-40 16 9 
13 -5 -3 
5 -2 -1 



Often it will not be known in advance if a given ny,n matrix A is invertible. However, if it is not, then by parts 
{a) and (c) of Theorem 1.5.3 it will be impossible to reduce A to by elementary row operations. This will be 
signaled by a row of zeros appearing on the left side of the partition at some stage of the inversion algorithm. If 
this occurs, then you can stop the computations and conclude that A is not invertible. 

EXAMPLE 5 Showing That a Matrix Is Not Invertible A 

Consider the matrix 





1 


6 


4 


A = 


2 


4 


-1 




-1 


2 


5 



Applying the procedure of Example 4 yields 





1 6 


4 


1 


0 


o' 






2 4 


-1 


0 


1 


0 






-1 2 


5 


0 


0 


1 




1 


6 


4 


1 


0 


o' 


We added —2 times the first 


0 


-8 • 


-9 


-2 


1 


0 


*— row to the second and added 


0 


8 


9 


1 


0 


1 


the first row to the tturd. 


1 


6 


4 


1 


0 


o' 


We added the 


0 


-8 ■ 


-9 


-2 


1 


0 


*— second row to 


0 


0 


0 


-1 


1 


1 


the third. 



Since we have obtained a row of zeros on the left side, A is not invertible. 



EXAMPLE 6 Analyzing Homogeneous Systems A 

Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial solutions. 

(a) XI +2^2 + 37:3 = 0 
2x1 + 5x2 + 3x3 = 0 

XI +8x3 = 0 

(b) XI + 6x2 + 4x3 = 0 
2x1+4x2— X3 = 0 

-XI + 2x2 + 5x3 = 0 

Solution From parts (a) and (b) of Theorem 1.5.3 a homogeneous linear system has only the 
trivial solution if and only if its coefficient matrix is invertible. From Example 4 and Example 5 
the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, system (a) has 
only the trivial solution whereas system (b) has nontrivial solutions. 



Concept Review 

• Row equivalent matrices 

• Elementary matrix 

• Inverse operations 

• Inversion algorithm 

Skills 

• Determine whether a given square matrix is an elementary. 

• Determine whether two square matrices are row equivalent. 

• Apply the inverse of a given elementary rwo operation to a matrix. 

« Apply elementary row operations to reduce a given square matrix to the identity matrix. 



• Understand the relationships between statements that are equivalent to the invertibility of a square 
matrix (Theorem 1.5.3). 

• Use the inversion algorithm to find the inverse of an invertible matrix. 

• Express an invertible matrix as a product of elementary matrices. 



Exercise Set 1 ,5 

1. Decide whether each matrix below is an elementary matrix. 



(a) 




1 


0 








5 


1_ 




(b) 




5 


r 








1 


0_ 




(c) 


"1 


1 


0 






0 


0 


1 






0 


0 


0 




(d) 


"2 


0 


0 


2 




0 


1 


0 


0 




0 


0 


1 


0 




0 


0 


0 


1 



Answer: 



(a) Elementary 

(b) Not elementary 

(c) Not elementary 

(d) Not elementary 

2. Decide whether each matrix below is an elementary matrix. 



(a) 



(b) 



(c) 



(d) 



1 0 

0 {3 

0 0 1 
0 1 0 
0 



1 

1 0 
0 1 
0 0 



-10 0 
0 0 1 
0 1 0 



3. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 



the identity matrix. 



(a) 



(b) 



(c) 



(d) 



1 -3 



0 



1 



-7 0 0 
0 1 0 



0 
0 

1 

0 



0 

1 

0 
-5 

0 0 10 

0 10 0 
10 0 0 
0 0 0 1 



Answer: 



(a) 



Add 3 times row 2 to row 1 



[-1 



(b) 



Multiply row 1 by 



1. 



1 



-7 0 



(c) 



Add 5 times row 1 to row 3: 



(d) 



Swap rows 1 and 3: 



0 1 0 
0 0 1 

1 0 0 
0 1 0 
5 0 1 

0 0 10 
0 10 0 
10 0 0 
0 0 0 1 



4. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 
the identity matrix. 



[-1 ?] 



(b) 



(c) 



1 0 0 

0 1 0 

0 0 3 

0 0 0 1 

0 10 0 

0 0 10 

10 0 0 



(d) 



1 


0 


1 


0 




7 




0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 



5. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding 
to E and show that the product EA results from applying the row operation to A, 

(a) 



(b) 



B = 



(c) 



E = 



0 
1 

-3 

0 4 



A = 



2-1 0-4 
1-3-1 5 
2 0 13 



-4 
3 
-1 





'1 4" 


. A = 


2 5 




3 6 



Answer: 



(a) 



Swap rows 1 and 2: £4 



3 _6 -6 -6] 
[-1 -2 5 -ij 



(b) 



Add —3 times row 2 to row 3: EA = 



(c) 



-6 -6 -( 
2 5 

2-1 0-4-4 
1-3-1 5 3 
-19 4 -12 -10 

13 28 



Add 4 times row 3 to row 1 : EA = 



2 5 

3 6 



6. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding 
to E and show that the product EA results from applying the row operation to A. 



(a) 



E = 



(b) 



E = 



(c) 



E = 



-6 

0 

1 

-4 

0 

1 0 

0 5 
0 0 



0 

1 

0 0 

1 0 
0 1 

0 

0 

1 



A = 



A = 



A = 



-1 -2 

3 -6 

2 -1 

1 -3 

2 0 

1 4 

2 5 

3 6 



5 -1 
-6 -6 

0 -4 
-1 5 

1 3 



-4 
3 
-1 



In Exercises 7-8, use the following matrices. 







4 


1 


A = 


r> 










1 


5 






4 


1 


C = 


0 








0 








"8 


1 5" 




F = 


8 


1 1 






3 


4 1 





, B = 



. D = 



8 1 5 

2 -7 -1 

3 4 1 

■815" 

-6 21 3 
3 4 1 



7. Find an elementary matrix E that satisfies the equation. 

(a) EA = B 

(b) fl5 = ^ 

(c) BA = C 

(d) flC = ^ 

Answer: 



(a) 



(b) 



(c) 



(d) 



1 
0 
0 

1 

0 1 0 

1 0 0 

10 0 
0 1 0 
-2 0 1 

10 0 
0 1 0 

2 0 1 



8. Find an elementary matrix E that satisfies the equation. 

(a) BB = D 

(b) ED = B 

(c) BB^F 

(d) EF = B 

In Exercises 9-24, use the inversion algorithm to find the inverse of the given matrix, if the inverse exists. 



Answer: 



[ 

"•[-3 -a 

Answer: 



-7 4 

2 -1 

-3 6 

4 5 



2 3 

7 7 

1 i 

7 7 



12, 



13. 



l-S 11 

3 4 -1 

1 0 3 

2 5-4 



Answer: 

3 
2 
-1 
i 
"2 



li 
10 
1 
1_ 
10 



14. 



15. 



12 0 
2 1 2 
0 2 1 

-1 3 -4 
2 4 1 
-4 2 -9 

Answer: 

No inverse 



6 
5 
1 
2 
5 



16. 



17. 



1 




1 


2 


5 




5 


5 


1 




1 


1 


5 




5 


10 


1 




4 


1 


5 




"5 


10 


1 


0 


r 




0 


1 


1 




1 


1 


0 





Answer: 



18. 



1 


1 


1 


2 


2 


2 


1 


1 


1 


2 


2 


2 


1 


i 


_i 

0 



19. 



/2 3/2 0 

-4/2 1/2 0 

0 0 1 

2 6 6 
2 7 6 
2 7 7 



Answer: 



20. 



21. 



i «- 


-3 




-1 1 


0 




0 -1 


1 




1 0 0 0" 






13 0 0 






13 5 0 






13 5 7 






2-4 0 




0 


1 2 12 




0 


0 0 2 




0 


0-1-4 




-5 



Answer: 



22. 



1 

4 




1 
2 


-3 


1 




1 


3 


8 




4 


2 


0 




0 


1 

2 


1 




1 


1 


40 




20 


10 


-8 


17 


2 


r 
3 


4 


0 


2 
5 


-9 


0 


0 


0 


0 


-1 


13 


4 


2 



1 


0 


1 


0 


2 


3 


-2 


6 


0 


-1 


2 


0 


0 


0 


1 


5 



Answer: 



7 


5 


5 


1 


12 


24 


8 


~4 


5 


5 


1 


1 


6 


12 


4 


2 


5 


5 


5 


1 


12 


24 


8 


4 


1 


1 


1 


1 


12 


24 


8 


4 



0 


0 


2 


0 


1 


0 


0 


1 


0 


-1 


3 


0 


2 


1 


5 


-3 



In Exercises 25-26, find the inverse of each of the following 4x4 matrices, where ki, k2. ^3, k^, and k are 
all nonzero. 



25. 



(a) 



(b) 



*1 


0 


0 


0 


0 




0 


0 


0 


0 


*3 


0 


0 


0 


0 




k 


1 0 


0" 




0 


1 0 


0 




0 


0 k 


1 




0 


0 0 


1 





Answer: 



(a) J_ 0 



*1 



26. 



(b) 


1 

k 


1 

~k 


0 


0 




0 


1 


0 


0 




0 


0 


i 

A: 






n 


n 


0 


1 

1 


(a) 


' 0 


0 


0 


kx 




0 


0 


^2 


0 




0 


^3 


0 


0 






0 


0 


0 

- 


(b) 


1 

AT 


0 0 


0 






1 


k 0 


0 






0 


1 k 


0 






0 


0 1 


k 





In Exercise 27-Exercise 28, find all values of c, if any, for which the given matrix is invertible. 

27 c c c 
\ c c 
1 1 c 

Answer: 

c#0, 1 

28. [c 1 0 

1 c 1 
0 1 c 

In Exercises 29-32, write the given matrix as a product of elementary matrices. 

29. [-2 



Hi] 



Answer: 



30, 



31. 



I 2 2\ [o 2JL0 lj[ 0 iJLl ij 



-3 1 

2 2 

0 

5 2 

1 0 -2 
0 4 3 
0 0 1 



Answer: 



1 


0 


-2 




"1 


0 


-2 


'1 


0 


0" 


'1 


0 


0" 


0 


4 


3 




0 


1 


0 


0 


1 


3 


0 


4 


0 


0 


0 


1 




0 


0 


1 


0 


0 


1 


0 


0 


1 



32. 



1 1 0 
1 1 1 
0 1 1 



In Exercises 33-36, write the inverse of the given matrix as a product of elementary matrices. 
33. The matrix in Exercise 29. 



Answer: 

i 1 
"4 8 

i 1 
4 8 



-[-1 ;i 



i 
■4 
0 



1 0 



34. The matrix in Exercise 30. 

35. The matrix in Exercise 3 1 . 

Answer: 



'l 


0 


2 




"l 


0 


0" 


0 


1 


3 




0 


1 


0 




4 


4 






4 




0 


0 


1_ 




0 


0 


1 



[1 


0 


0' 


"1 


0 


2 


0 


1 


-3 


0 


1 


0 


[o 


0 


1 


0 


0 


1 















36. The matrix in Exercise 32. 

In Exercises 37-38, show that the given matrices A and B are row equivalent, and find a sequence of 
elementary row operations that produces B from^. 



37. 


"1 2 


3" 




"1 


0 


5" 


A = 


1 4 


1 


, B = 


0 


2 


-2 




2 1 


9 




1 


1 


4 



Answer: 

Add —1 times the first row to the second row. Add —1 times the first row to the third row. Add — 1 times 
the second row to the first row. Add the second row to the third row. 



38. 



A = 



2 
-1 

3 



1 0 
1 0 
0 -1 



B = 



6 9 4 
-5 -1 0 
-1 -2 -1 



39. Show that if 



A = 



1 0 0 
0 1 0 

a h c 



is an elementary matrix, then at least one entry in the third row must be a zero. 



40. Show that 



A 



0 
b 
0 
0 



0 
d 
0 



a 



0 
/ 
0 



c 



0 



0 



0 
0 



e 



g 
0 



0 

0 
0 



0 



0 



h 



is not invertible for any values of the entries. 

41. Prove that if A and B are > ,>2 matrices, then A and B are row equivalent if and only if A and B have the 
same reduced row echelon form. 

42. Prove that if A is an invertible matrix and B is row equivalent to A, then B is also invertible. 

43. Show that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A. 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The product of two elementary matrices of the same size must be an elementary matrix. 
Answer: 

False 

(b) Every elementary matrix is invertible. 
Answer: 

True 

(c) If A and B are row equivalent, and if B and C are row equivalent, then A and C are row equivalent. 
Answer: 

True 

(d) If ^ is an ^ X « matrix that is not invertible, then the linear system ^ = 0 has infinitely many solutions. 
Answer: 

True 

(e) If A is an ^ x n matrix that is not invertible, then the matrix obtained by interchanging two rows of A cannot 
be invertible. 

Answer: 

True 



(f) If A is invertible and a multiple of the first row of A is added to the second row, then the resulting matrix is 
invertible. 

Answer: 

True 

(g) An expression of the invertible matrix v4 as a product of elementary matrices is unique. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.6 More on Linear Systems and Invertible Matrics 

In this section we will show how the inverse of a matrix can be used to solve a linear system and we will develop some more results about 
invertible matrices. 



Number of Solutions of a Linear System 

In Section 1.1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear system has either no solutions, has exactly one solution, 
or has infinitely many solutions. We are now in a position to prove this fundamental result. 



THEOREM 1.6.1 

A system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities. 

Proof If = b is a system of linear equations, exactly one of the following is true: (a) the system has no solutions, (b) the system has exactly 
one solution, or (c) the system has more than one solution. The proof will be complete if we can show that the system has infinitely many solutions 
in case (c). 

Assume that ^ = b has more than one solution, and let xg = xj — X2, where xi and X2 are any two distinct solutions. Because xi and X2 are 
distinct, the matrix xq is nonzero; moreover, 

AxQ = A(xi - X2) = ^1 - Ax:2 = b - b = 0 

If we now let k be any scalar, then 

^(xi + ^o) = ^1 -f ^(^o) = ^1 -f ^(^o) 
= b + ^ = b + 0 = b 

But this says that I kx{} is a solution of ^ = b- Since xq is nonzero and there are infinitely many choices for k, the system ^ = b has 
infinitely many solutions. 



Solving Linear Systems by Matrix Inversion 

Thus far we have studied two procedures for solving linear systems-Gauss- Jordan elimination and Gaussian elimination. The following theorem 
provides an actual formula for the solution of a linear system of n equations in n unknowns in the case where the coefficient matrix is invertible. 

THEOREM 1.6.2 

If A is an invertible ^ x « matrix, then for each « x 1 matrix b, the system of equations ^ = b has exactly one solution, namely, x = ^ ~^b 

iLl 

Proof Since ^(^"^bj = b, it follows that x = ^4 b is a solution of ^ = b- To show that this is the only solution, we will assume that xq is an 
arbitrary solution and then show that xq must be the solution ^ ~^b- 

If XQ is any solution of ^ = b? then = b. Multiplying both sides of this equation by ^ ~^ , we obtain xq = ^ ~^b• 
EXAMPLE 1 Solution of a Linear System Using >A~^ ^ 

Consider the system of linear equations 

7:1 + 27:2 + 37:3= 5 
27:1 + 57:2 + 37:3 = 3 
XI +8:^3 = 17 

In matrix form this system can be written as ^ = b. where 





"1 2 3" 








" 5 


A = 


2 5 3 




^2 


, b 




3 




1 0 8 




^3 




17 


In Example 4 of the preceding section, we showed that A is invertible and 










"-40 


16 


9" 






13 


-5 


-3 








5 


^2 


-1 





By Theorem 1.6.2, the solution of the system is 



-40 


16 


9 


5 




1 


13 


-5 


-3 


3 




-1 


5 


-2 


-1 


17 




2 



Keep in mind that the method of Example 1 only applies when the 
system has as many equations as unknowns and the coefficient 
matrix is invertible. 



Linear Systems with a Common Coefficient Matrix 

Frequently, one is concerned with solving a sequence of systems 

each of which has the same square coefficient matrix ^. If A is invertible, then the solutions 

xi=^"^bi, X2 = ^"^b2, X3 = ^"^b3,..., iLk = ^~^^k 
can be obtained with one matrix inversion and k matrix multiplications. An efficient way to do this is to form the partitioned matrix 

[^|bi|b2|- • -Ibftp (1) 

in which the coefficient matrix A is "augmented" by all k of the matrices bi, b2,. . .,byt, and then reduce 1 to reduced row echelon form by Gauss- 
Jordan elimination. In this way we can solve all k systems at once. This method has the added advantage that it applies even when A is not 
invertible. 

EXAMPLE 2 Solving Two Linear Systems at Once A 

Solve the systems 



(a) 


7:1 4- 2x2 4- 37:3 = 4 






2^:1 4- 5x2 + 3^3 = 5 






XI +8x3 = 9 




(b) 


7:1 + 27:2 + 37:3 = 


1 




27:1 + 57:2 + 37:3 = 


6 




;:i 4 87:3 = - 


6 



Solution The two systems have the same coefficient matrix. If we augment this coefficient matrix with the columns of constants on 
the right sides of these systems, we obtain 



"1 


2 


3 


4 


r 


2 


5 


3 


5 


6 


1 


0 


8 


9 


-6 



Reducing this matrix to reduced row echelon form yields (verify) 



"1 


0 


0 


1 


2" 


0 


1 


0 


0 


1 


0 


0 


1 


1 


-1 



It follows from the last two columns that the solution of system (a) is 7:1 = 1,7:2 = 0, 7:3 = 1 and the solution of system (b) is 7:1 = 2 
,7:2= l,:f3= - 1. 



Properties of Invertible Matrices 

up to now, to show that an ^ x « matrix A is invertible, it has been necessary to find an ^ x « matrix B such that 

AB = I and BA = I 

The next theorem shows that if we produce an^ x?^ matrix B satisfying either condition, then the other condition holds automatically. 

THEOREM 1.6.3 

Let ^ be a square matrix. 

(a) If 5 is a square matrix satisfying 5^ = /, then B = A~^- 

(b) If 5 is a square matrix satisfying = /, then B = A~^- 



We will prove part (a) and leave part (b) as an exercise. 

Proof (a) Assume that SJ{ = /. If we can show that A is invertible, the proof can be completed by multiplying SJ[ = J on both sides by A ~^ to 
obtain 

BAA~^=IA~^ or BI = IA~^ or B = A~^ 
To show that A is invertible, it suffices to show that the system j\x = 0 has only the trivial solution (see Theorem 1.5.3). Let xq be any solution of 
this system. If we multiply both sides of Axq = 0 on the left by B, we obtain BAx\} = 50 or /xq = 0 or xq = 0. Thus, the system of equations 
^ = 0 has only the trivial solution. 



Equivalence Tlieorem 

We are now in a position to add two more statements to the four given in Theorem 1.5.3. 

Ill] 

THEOREM 1.6.4 Equivalent Statements 

If y4 is an ^ X « matrix, then the following are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is In. 

(d) A is expressible as a product of elementary matrices. 
(^) i4x = b is consistent for every ^ x 1 matrix b. 

(f) Ax. = h has exactly one solution for every « x 1 matrix b. 



It follows from the equivalency of parts {e) and if) that if you can 
show that ^ = b has at least one solution for every n x \ matrix 
b, then you can conclude that it has exactly one solution for every 
« X 1 matrix b. 

Proof Since we proved in Theorem 1.5.3 that {a), (b), (c), and (d) are equivalent, it will be sufficient to prove that (a) =^ (f) =^ (e) ^ (a), 
(a) ^ (/) This was already proved in Theorem 1.6.2. 

(/) ^ (e) This is self-evident, for if ^ = b has exactly one solution for every ^ x 1 matrix b, then ^ = b is consistent for every ^ x 1 matrix b. 



(jg) ^ (a) If the system = b is consistent for every « x 1 matrix b, then, in particular, this is so for the systems 



Let XI, X2,. . .,x„ be solutions of the respective systems, and let us form an^xn matrix C having these solutions as columns. Thus C has the form 

C= [xi|x2| ■ ■ ■ |x„] 
As discussed in Section 1.3, the successive columns of the product will be 

[see Formula 8 of Section 1.3]. Thus, 



'r 




'o' 




'o' 


0 




1 




0 


0 


, Ax: = 


0 




0 


0 




0 




1 



AC=[Axi\M\- ■ ■\Ax„] = 

By part (b) of Theorem 1.6.3, it follows that C = A~^' Thus, A is invertible. 



1 0 

0 1 

0 0 

0 0 



= / 



We know from earlier work that invertible matrix factors produce an invertible product. Conversely, the following theorem It shows that if the 
product of square matrices is invertible, then the factors themselves must be invertible. 



THEOREM 1.6.5 

Let A and B be square matrices of the same size. If AB is invertible, then A and B must also be invertible. 

In our later work the following fundamental problem will occur frequently in various contexts. 

r 

A Fundamental Problem 

Let ^ be a fixed f^xn matrix. Find all ^ x 1 matrices b such that the system of equations ^ = b is consistent. 



If A is an invertible matrix. Theorem 1.6.2 completely solves this problem by asserting that for every mxl matrix b, the linear system = b has 
the unique solution x = j4 ~^b- If ^ is not square, or if A is square but not invertible, then Theorem 1 .6.2 does not apply. In these cases the matrix b 
must usually satisfy certain conditions in order for ^ = b to be consistent. The following example illustrates how the methods of Section 1 .2 can 
be used to determine such conditions. 

EXAMPLE 3 Determining Consistency by Elimination M 

What conditions must b\, b2, and b^ satisfy in order for the system of equations 

XI +7:3 = b2 
2x1+^2 + 3x3 = ^3 

to be consistent? 



Solution The augmented matrix is 



\ \ 2 bi 
\ 0 \ b2 
2 1 3 Z?3 



which can be reduced to row echelon form as follows: 



1 






2 bi 


0 






-1 b2-bi 


u 






— 1 02 — ^0] 


1 






2 bi 


0 






1 bi-b2 


0 






-1 ^3-2^1 


1 


1 


2 


^1 


0 


1 


1 


bi-b2 


0 


0 


0 


b2-b2-b\ 



— 1 times the first row was added to the second and — 2 times the first row was added to the third. 



The second row was multiplied by— 1. 



The second row was added to the third. 



It is now evident from the third row in the matrix that the system has a solution if and only iib\, b2, and satisfy the condition 

^3 — ^2 ~ -^1 — ^3 = ^1 -1-^2 

To express this condition another way, = b is consistent if and only if b is a matrix of the form 

^1 



b = 



where bi and b2 are arbitrary. 



EXAMPLE 4 Determining Consistency by Elimination A 



What conditions must bi,b2, and bs satisfy in order for the system of equations 

XI + 2x2 + 3^3 = ^1 
2X1 + ^^2 + 3^3 = ^2 
XI +8^3 = ^3 

to be consistent? 

Solution The augmented matrix is 

"12 3 bi' 
2 5 3 b2 
1 0 8 ^3 

Reducing this to reduced row echelon form yields (verify) 

'1 0 0 -40^1 16^2 + 9Z?3 
0 1 0 \3bx- 5b2-3b3 
0 0 1 5^1- 2^2- h 
In this case there are no restrictions on Z^i, b2, and Z73, so the system has the unique solution 

XI = -AObi -f 16^2 I 9b2, X2 = 13^1-5^2-3^3. ;^3 = 5Zji - 2^2 - ^3 

for all values of Z^i, b2, and Z73. 



(2) 



(3) 



What does the result in Example 4 tell you about the coefficient 
matrix of the system? 



Skills 

• Determine whether a linear system of equations has no solutions, exactly one solution, or infinitely many solutions. 

• Solve linear systems by inverting its coefficient matrix. 

• Solve multiple linear systems with the same coefficient matrix simultaneously. 



• Be familiar with the additional conditions of invertibility stated in the Equivalence Theorem. 



Exercise Set 1 .6 

In Exercises 1-8, solve the system by inverting the coefficient matrix and using Theorem 1.6.2. 

1. xi^ X2 = 2 
5x1-^6x2 = 9 

Answer: 

= 3, X2 = - 1 
2.47:1-37:2= -3 

2x1-5:^2 = 9 
3. 7:1 + 37:2 + 7:3 = 
27:1 + 27:2 + 7:3 = 
27:1 + 37:2 + 7:3 = 

Answer: 

xi = - \, 7:2 = 4, 7:3= -7 
4.57:1 + 37:2 + 27:3 = 4 
37:1 + 37:2 f 27:3 = 2 
7:2+7:3 = 5 

5. 7:+7+ z = 5 
7:+7-4z = 10 

-47: +7+ z = 0 

Answer: 

7: = 1, 7: = 5, 7: = - 

6. - 7: - 2^^ - 3z 

w + 7: + 47 + 4z 

w + 37: + 77 + 9z 
—w — 2x — Ay — 6z 

7. 37:1 + 57:2 = ^1 

7:1 + 27:2 =^2 

Answer: 

XI = 2bi -5b2, X2 = -bi + 3Z>2 

8. 7:1 + 27:2+ 37:3 = bi 
2x1 + 57:2 + 57:3 = b2 
3x\ + 57:2 I 87:3 = ^3 

In Exercises 9-12, solve the linear systems together by reducing the appropriate augmented matrix. 

9. XI -5x2 = bi 

3x\ +27:2 = ^2 

(i) ^1 = 1, ^2 = 4 

(ii) ^1 = -2, b2 = 5 

Answer: 



4 
-1 

3 




10. -^1 +4^:2+ X2 = b\ 

6x1+47:2—87:3 = b2 

(i) ^1=0, b2 = \, b3 = 0 

(ii) -3, b2 = 4, b2= -5 

11.4x1 -7x2 = ^1 

7:1 + 2x2 = ^2 

(i) ^'1 = 0, b2 = \ 

(ii) ^1= -4, b2 = 6 

(iii) ^1= -1, b2 = 3 

(iv) ^1= -5, b2 = l 



Answer: 



(iii) ^,=11 ;,„13 

(iv) ,i=-i = 1 

12. XI +3x2 + 5x3=^1 
-XI - 2x2 =b2 
2x 1 -h 5x2 + 4x3 = b2 

(i) ^1 = 1, ^2 = 0. ^3 = -1 

(ii) bi=0, b2=l, b3 = l 

(iii) ^1= -1, ^2= -1, *3 = 0 

In Exercises 13-17, determine conditions on the b/s, if any, in order to guarantee that the linear system is consistent. 

13. XI + 3x2 = ^1 
—2x1 + ^2 = b2 

Answer: 



No conditions on 61 and &2 

14. 6x1 — 4x2=i>i 
3x1 — 2x2=^2 

15. XI -2x2 + 5x3 = b\ 

4x1 — 5x2 + 8x3 = ^2 
—3x1-1-3x2 — 3x3 = 63 

Answer: 

b2 = bi -i>2 

16. XI -2x2- ^3 = ^1 
—4x1 5x2 2x3 = b2 
-4x1 I 7x2-1-4x3 = ^3 

17. XI — X2 + 3x3 + 2x4 = ^1 
—2x1 H- ^2 + 5x3 + ^4 = *2 
— 3x1 + 2x2 + 2x3 — X4 = b2 

4x1 — 3x2 + ^3 + 3x4 = b4 

Answer: 



=^3-1-^4, b2 = 2Z>3 + ^4 
18. Consider the matrices 





'2 


1 


2" 






A = 


2 


2 


-2 


and X = 






3 


1 


1 




^3 



(a) Show that the equation ^ = x can be rewritten as (.4 — /)x = 0 and use this result to solve Ax: = x for 

(b) Solve ^4x = 4x. 

In Exercises 19-20, solve the given matrix equation forX 
19. 



1 -1 

2 3 



0 2-1 
Answer: 



X = 



2 -1 
4 
3 



5 7 8 
0-301 
5-721 



X = 



11 

-6 -8 
-15 -21 



12 -3 27 26 
_8 1 -18 -17 
9 -38 -35 



20. 


'-2 


0 


r 




'A 


3 


2 


r 




0 


-1 


-1 


X = 


6 


7 


8 


9 




1 


1 


-4 




1 


3 


7 


9 



21. Let ^ = 0 be a homogeneous system of n linear equations in n unknowns that has only the trivial solution. Show that if k is any positive 
integer, then the system = 0 also has only the trivial solution. 

22. Let ^4x = 0 be a homogeneous system of n linear equations in n unknowns, and let Q be an invertible x matrix. Show that ^ = 0 has just 
the trivial solution if and only if (QA)x = 0 has just the trivial solution. 

23. Let ^4x = b be any consistent system of linear equations, and let xi be a fixed solution. Show that every solution to the system can be written in 
the form x = -h xq, where xq is a solution to ^ = 0- Show also that every matrix of this form is a solution. 

24. Use part (a) of Theorem 1.6.3 to prove part (b). 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) It is impossible for a linear system of linear equations to have exactly two solutions. 
Answer: 

True 

(b) If the linear system ^ = b has a unique solution, then the linear system ^ = c also must have a unique solution. 
Answer: 

True 

(c) If A and 5 are « x « rnatrices such that AB = /„, then BA = 
Answer: 

True 

(d) If A and B are row equivalent matrices, then the linear systems ^ = 0 and Bx = 0 have the same solution set. 
Answer: 

True 

(e) If A is an x ?2 matrix and S is an ^2 x « invertible matrix, then if x is a solution to the linear system (S~^AS)x = b, then Sx is a solution to the 
linear system Ay = Sb. 



Answer: 



True 



(f) Let ^ be an « X « matrix. The linear system i4x = 4x has a unique solution if and only if i4 — 4 j is an invertible matrix. 
Answer: 

True 

(g) Let A and 5 be » x « matrices. If A or B (or both) are not invertible, then neither is AB. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



1.7 Diagonal, Triangular, and Symmetric Matrices 

In this section we will discuss matrices that have various special forms. These matrices arise in a wide variety of applications 
and will also play an important role in our subsequent work. 



Diagonal Matrices 



A square matrix in which all the entries off the main diagonal are zero is called a diagonal matrix. Here are some examples: 

"6 0 0 O' 



"0 


0" 




"2 


0' 


0 


0_ 




0 


-5_ 



1 0 0 
0 1 0 
0 0 1 



0-400 

0 0 0 0 
0 0 0 8 



A general nxn diagonal matrix D can be written as 



D = 



di 0 
0 d2 

0 0 



0 
0 



(1) 



A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in this case the inverse of 1 is 

'\/di 0 ... 0 
0 \/d2 ... 0 



0 



0 



lfd„ 



(2) 



Confirm Formula 2 by showing that 
DD~^=D~^D = I 



Powers of diagonal matrices are easy to compute; we leave it for you to verify that if D is the diagonal matrix 1 and A: is a 
positive integer, then 



(if 0 ... 0 
0 dl^ ... 0 



0 0 



(3) 



EXAMPLE 1 Inverses and Powers of Diagonal Matrices M 



If 



A= 



1 0 0 
0-3 0 
0 0 2 



then 



0 0 

A 0 

5 



1 0 0 
0 -243 0 

0 0 32 



1 

'243 
0 



0 
0 

J_ 
32 



Matrix products that involve diagonal factors are especially easy to compute. For example, 





0 


0 


'an an 


-^13 


a\A 




d\a\\ 


d\an 


d\ai2 


d\a\4 


0 




0 


a2\ ^22 


^23 


«24 




^2^21 


d2a22 


d2a22 


d2a2A 


0 


0 


^3 


_«31 ^22 




a2A_ 




_^3^31 


d2a22 


d2a23 


d2a2A 




'^11 
«21 


a\2 «13" 
^^22 "^23 


0 
0 


0 

d2 
0 


0 ■ 

0 

d2 




d\a\\ 
d\a2\ 


d2a\2 
d2a22 


d2a\2 
<^3^23 






^31 
^41 


a42 <^43 




d\a2\ 
d\aA\ 


d2a22 
^2^42 


(^3(3 33 
d2a42 





In words, to multiply a matrix A on the left by a diagonal matrix D, one can multiply successive rows of A by the 
successive diagonal entries of D, and to multiply A on the right by D, one can multiply successive columns of A by the 
successive diagonal entries of D. 



Triangular Matrices 

A square matrix in which all the entries above the main diagonal are zero is called lower triangular, and a square matrix in 
which all the entries below the main diagonal are zero is called upper triangular. A matrix that is either upper triangular or 
lower triangular is called triangular. 

EXAMPLE 2 Upper and Lower Triangular Matrices -4 
a\\ a\2 ^'13 oi4 

0 022 ^'23 ^24 
0 0 ^33 ^34 

0 0 0 au 

t 

A general 4x4 upper 
triaDgiilar matrix 





0 


0 


0 


«2I 


022 


0 


0 




032 




0 




<'42 







t 

A general 4x4 lower 
triangular inatnx 



Remark Observe that diagonal matrices are both upper triangular and lower triangular since they have zeros below and 
above the main diagonal. Observe also that a square matrix in row echelon form is upper triangular since it has zeros below 
the main diagonal. 



Properties of Triangular Matrices 



Example 2 illustrates the following four facts about triangular matrices that we will state without formal proof. 

• A square matrix A= [c^ij] is upper triangular if and only if all entries to the left of the main diagonal are zero; that is, 
ta(y = 0 if? > / (Figure 1.7.1). 

• A square matrix A= [(^ij] is lower triangular if and only if all entries to the right of the main diagonal are zero; that is, 
aij = 0 if j < j (Figure 1.7.1). 

• A square matrix A= [c^ij] is upper triangular if and only if the ith row starts with at least j _ 1 zeros for every i. 

• A square matrix A= [c^ij] is lower triangular if and only if the Jth column starts with at least j — I zeros for every J. 

i>j 
Figure 1.7.1 

The following theorem lists some of the basic properties of triangular matrices. 

o 

THEOREM 1.7.1 

(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is 
lower triangular. 

(b) The product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper 
triangular. 

(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 

(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper 
triangular matrix is upper triangular. 



Part (a) is evident from the fact that transposing a square matrix can be accomplished by reflecting the entries about the main 
diagonal; we omit the formal proof We will prove (b), but we will defer the proofs of (c) and (d) to the next chapter, where 
we will have the tools to prove those results more efficiently. 

Proof (b) We will prove the result for lower triangular matrices; the proof for upper triangular matrices is similar. Let 
A= and B — [b^^] be lower triangular ^xn matrices, and let C — [cij] be the product C = AB- We can prove that C 
is lower triangular by showing that Cjj = 0 for j < J. But from the definition of matrix multiplication. 

If we assume that i <. j, then the terms in this expression can be grouped as follows: 

Tentis in which the row Terms in which the row 

itiiitiber of & is less than the number of a is less than 

column number of b the column number of a 

In the first grouping all of the b factors are zero since B is lower triangular, and in the second grouping all of the a factors are 
zero since A is lower triangular. Thus, c^j = 0, which is what we wanted to prove. 



EXAMPLE 3 Computations with Triangular Matrices M 



Consider the upper triangular matrices 





"1 


3 


-1" 




"3 


-2 


2 


A = 


0 


2 


4 


, B = 


0 


0 


-1 




0 


0 


5 




0 


0 


1 



It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix B is not. Moreover, the 



theorem also tells us that j[ 
statements by showing that 



, AB, and BA must be upper triangular. We leave it for you to confirm these three 



1-1 4 

0 0 4 





"3 


-2 


-2" 




'3 


5 


-1" 


AB = 


0 


0 


2 


, BA = 


0 


0 


-5 




0 


0 


5 




0 


0 


5 



Symmetric Matrices 

r 

DEFINITION 1 

A square matrix A is said to be symmetric if ^4 = A^- 

L 



It is easy to recognize a symmetric matrix by 
inspection: The entries on the main diagonal have no 
restrictions, but mirror images of entries across the 
main diagonal must be equal. Here is a picture using 
the second matrix in Example 4: 




All diagonal matrices, such as the third matrix in 
Example 4, obviously have this property. 

EXAMPLE 4 Symmetric Matrices A 



The following matrices are symmetric, since each is equal to its own transpose (verify). 

"t^l 0 0 0 

1 4 S 

7 -3 



-3 



4 5 
-3 0 
0 7 



^^2 0 0 
0 ^^3 0 
0 0 



Remark It follows from Formula 11 of Section 1.3 that a square matrix A = [^Xy ] is symmetric if and only if 



{A)ij={A)^i (4) 

for all values of / and j. 

The following theorem lists the main algebraic properties of symmetric matrices. The proofs are direct consequences of 
Theorem 1.4.8 and are omitted. 



THEOREM 1.7.2 

If A and B are symmetric matrices with the same size, and if k is any scalar, then: 
(^) is symmetric. 

(b) A^B and ^ _ 5 are symmetric. 

(c) kA is symmetric. 

m 

It is not true, in general, that the product of symmetric matrices is symmetric. To see why this is so, let A and B be symmetric 
matrices with the same size. Then it follows from part (e) of Theorem 1.4.8 and the symmetry of v4 and B that 

(AB)^ = B'^A^ = BA 

T 

Thus, (AB) = AB if and only if = that is, if and only if A and B commute. In summary, we have the following 
result. 

THEOREM 1.7.3 

The product of two symmetric matrices is symmetric if and only if the matrices commute. 



EXAMPLE 5 Products of Symmetric Matrices A 

The first of the following equations shows a product of symmetric matrices that is not symmetric, and the 
second shows a product of symmetric matrices that is symmetric. We conclude that the factors in the first 
equation do not commute, but those in the second equation do. We leave it for you to verify that this is so. 



'l 


2 




"-4 r 




"-2 1 


2 


3 




1 0_ 




-5 2 


1 


2' 


"-4 3 




"2 r 


2 


3_ 


3 -1 




_1 3 



Invertibility of Symmetric Matrices 

In general, a symmetric matrix need not be invertible. For example, a diagonal matrix with a zero on the main diagonal is 



symmetric but not invertible. However, the following theorem shows that if a symmetric matrix happens to be invertible, then 
its inverse must also be symmetric. 

THEOREM 1.7.4 

If A is an invertible symmetric matrix, then j[ ~^ is symmetric. 

y :i 

Proof Assume that A is symmetric and invertible. From Theorem 1 .4.9 and the fact that ^ = ^4 ^, we have 

{A-'f={A^y'=A-' 

which proves that ^4 ~^ is symmetric. 



Products AA^ and A^A 



Matrix products of the form AA^ and A^A arise in a variety of applications. If ^ is an ^ x « matrix, then is an ^ x w 
matrix, so the products AA^ SLiidA^A are both square matrices — the matrix has size m xm, and the matrix A'^A has size 
« X «• Such products are always symmetric since 

(aa'^^^ =(A'^y a'^ = AA'^ and (a'^A^^ = a'^(a'^^^ = a'^A 

EXAMPLE 6 The Product of a Matrix and Its Transpose Is Symmetric M 



Let ^ be the 2 X 3 matrix 



Then 



AA^ = 



A = 



1 -2 4 
3 0-5 



1 


3 


-2 


0 


4 


-5 



1 -2 

3 0 



1 -2 4 
3 0-5 



1 
-2 
4 



10 -2 -11 

-2 4 -8 

-11 -8 41 

21 -17' 

-17 34 



T T 

Observe that A A and AA are symmetric as expected. 



Later in this text, we will obtain general conditions on A under which AA and A A are invertible. However, in the special 
case where A is square, we have the following result. 



THEOREM 1.7.5 

T T 

If A is an invertible matrix, then A A and A A are also invertible. 



Proof Since A is invertible, so is A by Theorem 1 .4.9. Thus AA and A A are invertible, since they are the products of 
invertible matrices. 



Concept Review 

• Diagonal matrix 

• Lower triangular matrix 

• Upper triangular matrix 

• Triangular matrix 

• Sjmimetric matrix 

Skills 

• Determine whether a diagonal matrix is invertible with no computations. 

• Compute matrix products involving diagonal matrices by inspection. 

• Determine whether a matrix is triangular. 

• Understand how the transpose operation affects diagonal and triangular matrices. 

• Understand how inversion affects diagonal and triangular matrices. 

• Determine whether a matrix is a symmetric matrix. 



Exercise Set 1.7 



In Exercises 1^, determine whether the given matrix is invertible. 



1. 2 

0 



0 
-5 



Answer: 



i 

2 



0 



1 

5 



2. 



4 
0 
0 



0 
0 
0 



0 
0 
5 



3. 



-1 
0 



0 0 
2 0 



0 




Answer: 



-10 0 

0 1 0 
0 0 3 

-10 0 0 

0 3 0 0 

0 0-3 0 

0 0 0 -2 

In Exercises 5-8, determine the product by inspection. 



5. 



3 0 0 
0-10 
0 0 2 

Answer: 

6 3 

4 -1 
4 10 



2 1 
-4 1 

2 5 



V J 1] 



-4 0 0 

0 3 0 
0 0 2 



5 0 0 
0 2 0 
0 0 -3 

Answer: 



-3 2 0 4 
1-530 
-6 2 2 2 



-4 

3 
2 



-15 10 0 20 -20 
2 -10 6 0 6 

18 -6 -6 -6 -6 



8. 



2 0 0 
0-10 
0 0 4 



4-1 3 
1 2 0 
-5 1 -2 



-3 0 0 
0 5 0 
0 0 2 



In Exercises 9-12, find ji^, jl ^, and ^ (where k is any integer) by inspection. 



Answer: 



4i :]• 





'1 0' 

















1 0 
0 l/(-2)' 



10. 



A = 



-6 0 0 

0 3 0 
0 0 5 



11. 



A = 



0 i 



0 i 



Answer: 



^2 = 



0 1 



12. 



-2 



,A-' = 



"4 


0 


0" 




2* 


0 


0 


0 


0 


0 




0 




0 


0 


0 


16 




_ 0 


0 


4^ 



0 
0 



0 


0 


0 


-4 


0 


0 


0 


-3 


0 


0 


0 


2 



In Exercises 13-19, decide whether the given matrix is symmetric. 



13. [-8 -8] 
L 0 oj 

Answer: 

Not symmetric 



0 -7 

7 



Answer: 



Symmetric 



16. 



17. 



2 
-6 
6 



Answer: 



Not symmetric 



18. 



19. 



2 -1 
-1 5 

3 1 



0 0 1 
0 2 0 

3 0 0 



Answer: 



Not symmetric 

In Exercises 20-22, decide by inspection whether the given matrix is invertible. 

20. -1 2 4 
0 3 0 
0 0 5 

21. 0 1-25 
0 1 5 6 
0 0-31 
0 0 0 5 

Answer: 

Not invertible 

22. r 2 0 0 0 
-3-10 0 
-4-6 0 0 

0 3 8 -5 

In Exercises 23-24, find all values of the unknown constant(s) in order for^ to be symmetric. 
23. 



Answer: 
fl= -8 



24. 



A = 



2 a-2b + 2c 2a + b + c 

3 5 a + c 
0-2 7 



In Exercises 25-26, find all values of x in order for A to be invertible. 



25. 



A = 



x-\ X* 
0 x + 2 
0 0 x-4 



Answer: 
x*l. -2,4 



26. 



A = 



0 



X 



'-5 » 

4 



In Exercises 27-28, find a diagonal matrix A that satisfies the given condition. 
27. 



A' = 



1 0 0 
0-1 0 
0 0-1 



Answer: 



28. 



1 0 0 
0-1 0 
0 0 -1 

9 0 0 
0 4 0 
0 0 1 



4-2 = 



29. Verify Theorem 1. 7. 1(b) for the product where 





"-1 


2 


5" 




"2 


-8 


0" 




0 


1 


3 


, B = 


0 


2 


1 




0 


0 


-4 




0 


0 


3 



30. Verify Theorem 1.1. 1(d) for the matrices A and B in Exercise 29. 

31. Verify Theorem 1.7.4 for the given matrix A. 



1]- 



(b) 



A = 



1 -2 3 
-2 1 -7 
3-7 4 



32. Let A be an ,w v n symmetric matrix. 
(^) Show that^^ is symmetric. 

(b) Show that 2A^ — 3 A I / is symmetric. 

33. Prove: If = A-> then^ is symmetric and A = A^- 

34. Find all 3x3 diagonal matrices A that satisfy _ 3^.4 _ 4/ = Q. 

35. Let A = [cLij] be an « x « matrix. Determine whether^ is symmetric. 



(a) ai 



(b) a,j=i^-/ 

(c) aij = 2i-^2j 

(d) uij = 21^-^2/ 

Answer: 

(a) Yes 

(b) No (unless « = 1) 

(c) Yes 

(d) No (unless » = 1) 

36. On the basis of your experience with Exercise 35, devise a general test that can be applied to a formula for Uij to determine 
whether i4= [fly] is symmetric. 

37. A square matrix A is called skew-symmetric if = — ^4. 

Prove: 

(a) If ^ is an invertible skew-symmetric matrix, then is skew-symmetric. 

(b) If A and B are skew-symmetric matrices, then so are ^4 , A-^B, A — B, and kA for any scalar k. 



(c) Every square matrix A can be expressed as the sum of a symmetric matrix and a skew- symmetric matrix. [Hint: Note 
the identity ^ = + + ^ - .4 ^).] 

In Exercises 38-39, fill in the missing entries (marked with x) to produce a skew- symmetric matrix. 
38. 



A = 



39. 



A = 



X X 4 

0 X X 

X —1 X 

X Ox 

X X -4 

8 X X 



Answer: 

0 0 -8 
0 0-4 
8 4 0 

40. Find all values of a, b, c, and d for which ^ is skew- symmetric. 

0 2(3-3i + c 3a-5b \ 5c 



A = 



-2 

-3 



0 

-5 



5<3 - 8i + 6c 
d 



41. We showed in the text that the product of symmetric matrices is symmetric if and only if the matrices commute. Is the 
product of commuting skew- symmetric matrices skew- symmetric? Explain. [Note: See Exercise 37 for the deffinition of 
skew-sym m etric . ] 

42. If the ^ X « matrix A can be expressed as ^ = i jy, where Z is a lower triangular matrix and U is an upper triangular 
matrix, then the linear system ^ = b can be expressed as i J/x = b can be solved in two steps: 

Step 1 . Let Z7x = y, so that i f7x = b can be expressed as iy = b. Solve this system. 

Step 2. Solve the system L/x = y for x. 

In each part, use this two-step method to solve the given system. 



(a) 



(b) 



1 

-2 
2 

2 
4 

-3 



0 0 

3 0 

4 1 

0 0 

1 0 

-2 3 



2 -1 

0 
0 

'3 
0 
0 



3 

1 2 
0 4 

-5 2 
4 1 

0 2 



^1 
^2 

^3 



1 

-2 
0 



^3 



4 
-5 

2 



43. Find an upper triangular matrix that satisfies 



30 1 

-sj 



Answer: 
A = 



1 10 
0 -2 



True-False Exercises 



In parts (a)-(m) determine whether the statement is true or false, and justify your answer. 



(a) The transpose of a diagonal matrix is a diagonal matrix. 



Answer: 

True 

(b) The transpose of an upper triangular matrix is an upper triangular matrix. 
Answer: 

False 

(c) The sum of an upper triangular matrix and a lower triangular matrix is a diagonal matrix. 
Answer: 

False 

(d) All entries of a symmetric matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 

True 

(e) All entries of an upper triangular matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 

True 

(f) The inverse of an invertible lower triangular matrix is an upper triangular matrix. 
Answer: 

False 

(g) A diagonal matrix is invertible if and only if all of its diagonal entries are positive. 
Answer: 

False 

(h) The sum of a diagonal matrix and a lower triangular matrix is a lower triangular matrix. 
Answer: 

True 

(i) A matrix that is both symmetric and upper triangular must be a diagonal matrix. 
Answer: 

True 

(j) If A and ^ are x « matrices such that i4 + 5 is symmetric, then A and B are symmetric. 
Answer: 

False 

(k) If A and ^ are « x « matrices such that ^4 -h 5 is upper triangular, then A and B are upper triangular. 
Answer: 
False 

(1) If A'^ is a symmetric matrix, then ^ is a symmetric matrix. 



Answer: 

False 

(m) If kA is a symmetric matrix for some fc ?e 0? then ^ is a symmetric matrix. 
Answer: 
True 
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1.8 Applications of Linear Systems 

In this section we will discuss some relatively brief applications of linear systems. These are but a small sample of the wide 
variety of real- world problems to which our study of linear systems is applicable. 

Network Analysis 

The concept of a network appears in a variety of applications. Loosely stated, a network is a set of branches through which 
something "flows." For example, the branches might be electrical wires through which electricity flows, pipes through 
which water or oil flows, traffic lanes through which vehicular traffic flows, or economic linkages through which money 
flows, to name a few possibilities. 

In most networks, the branches meet at points, called nodes or junctions, where the flow divides. For example, in an 
electrical network, nodes occur where three or more wires join, in a traffic network they occur at street intersections, and in 
a financial network they occur at banking centers where incoming money is distributed to individuals or other institutions. 

In the study of networks, there is generally some numerical measure of the rate at which the medium flows through a 
branch. For example, the flow rate of electricity is often measured in amperes, the flow rate of water or oil in gallons per 
minute, the flow rate of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros per day. 
We will restrict our attention to networks in which there is flow conservation at each node, by which we mean that the rate 
of flow into any node is equal to the rate of flow out of that node. This ensures that the flow medium does not build up at 
the nodes and block the free movement of the medium through the network. 

A common problem in network analysis is to use known flow rates in certain branches to find the flow rates in all of the 
branches. Here is an example. 

EXAMPLE 1 Network Analysis Using Linear Systems A 

Figure 1.8.1 shows a network with four nodes in which the flow rate and direction of flow in certain 
branches are known. Find the flow rates and directions of flow in the remaining branches. 

30 




60 

Figure 1.8.1 



Solution As illustrated in Figure 1.8.2, we have assigned arbitrary directions to the unknown flow rates 
x\, X2, and ^2. We need not be concerned if some of the directions are incorrect, since an incorrect direction 
will be signaled by a negative value for the flow rate when we solve for the unknowns. 



30 




60 

Figure 1.8.2 



It follows from the conservation of flow at node A that 

XI +;r2 = 30 

Similarly, at the other nodes we have 

^2 + ^3 = 35 (node B) 
;^3+15 = 60 (nodeC) 
xi~\-\5 = 55 (nodeD) 
These four conditions produce the linear system 

XI -\- X2 =30 
^2 + ^3 = 35 
X3=45 
XI =40 

which we can now try to solve for the unknown flow rates. In this particular case the system is sufficiently 
simple that it can be solved by inspection (work from the bottom up). We leave it for you to confirm that the 
solution is 

XI =40, X2 = - 10, X3=45 

The fact that is negative tells us that the direction assigned to that flow in Figure 1.8.2 is incorrect; that is, 
the flow in that branch is into node A. 

EXAMPLE 2 Design of Traffic Patterns < 

The network in Figure 1.8.3 shows a proposed plan for the traffic flow around a new park that will house the 
Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a computerized traffic light at the north exit on 
Fifth Street, and the diagram indicates the average number of vehicles per hour that are expected to flow in 
and out of the streets that border the complex. All streets are one-way. 

(a) How many vehicles per hour should the traffic light let through to ensure that the average number of 
vehicles per hour flowing into the complex is the same as the average number of vehicles flowing out? 

(b) Assuming that the traffic light has been set to balance the total flow in and out of the complex, what can 
you say about the average number of vehicles per hour that will flow along the streets that border the 
complex? 
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Figure 1.8.3 
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So/i/t/on 

(a) If, as indicated in Figure 1.8.3Z? we let x denote the number of vehicles per hour that the traffic light must 
let through, then the total number of vehicles per hour that flow in and out of the complex will be 

flowing in: 500 1 400 1 600 1 200= 1700 
Plowing out: :^ } 700 I 400 

Equating the flows in and out shows that the traffic light should let x = 600 vehicles per hour pass 
through. 

(b) To avoid traffic congestion, the flow in must equal the flow out at each intersection. For this to happen, 
the following conditions must be satisfied: 



Intersection Flow In 



Flow Out 



A 
B 
C 
D 



400 + 600 = ^1+^2 

^2 + ^3 = 400 + x 

500 + 200 = ^3 -^^4 

?fl+^4 = 700 



Thus, with X = 600^^^ computed in part (a), we obtain the following linear system: 

XI \ X2 = 1000 

7:2+7:3 = 1000 
7:3 + ^4= 700 
XI +7:4= 700 

We leave it for you to show that the system has infinitely many solutions and that these are given by the 
parametric equations 



7:1=700-^, X2 = 300 + ^, ;t3 = 700-^, X4 = t 



(1) 



However, the parameter t is not completely arbitrary here, since there are physical constraints to be 
considered. For example, the average flow rates must be nonnegative since we have assumed the streets 
to be one-way, and a negative flow rate would indicate a flow in the wrong direction. This being the 
case, we see from 1 that t can be any real number that satisfies 0 < ^ < 700, which implies that the 
average flow rates along the streets will fall in the ranges 

0<xi< 700, 300 < < 1000, 0<X2< 700, 0<X4< 700 



Electrical Circuits 



Next, we will show how network analysis can be used to analyze electrical circuits consisting of batteries and resistors. A 
battery is a source of electric energy, and a resistor, such as a lightbulb, is an element that dissipates electric energy. Figure 
1.8.4 shows a schematic diagram of a circuit with one battery (represented by the symbol ^|^), one resistor (represented by 

the symbol ^w^), and a switch. The battery has a positive pole (+) and a negative pole (-). When the switch is closed, 
electrical current is considered to flow from the positive pole of the battery, through the resistor, and back to the negative 
pole (indicated by the arrowhead in the figure). 

I > I 



Switch 
Figure 1.8.4 



Electrical current, which is a flow of electrons through wires, behaves much like the flow of water through pipes. A battery 
acts like a pump that creates "electrical pressure" to increase the flow rate of electrons, and a resistor acts like a restriction 
in a pipe that reduces the flow rate of electrons. The technical term for electrical pressure is electrical potential, it is 
commonly measured in volts (V). The degree to which a resistor reduces the electrical potential is called its resistance and 
is commonly measured in ohms {Q). The rate of flow of electrons in a wire is called current and is commonly measured in 
amperes (also called amps) (A). The precise effect of a resistor is given by the following law: 

r n 



Ohm's Law 

If a current of / amperes passes through a resistor with a resistance of R ohms, then there is a resulting drop of E 
volts in electrical potential that is the product of the current and resistance; that is, 

E = IR 



A typical electrical network will have multiple batteries and resistors joined by some configuration of wires. A point at 
which three or more wires in a network are joined is called a node (or junction point). A branch is a wire connecting two 
nodes, and a closed loop is a succession of connected branches that begin and end at the same node. For example, the 
electrical network in Figure 1.8.5 has two nodes and three closed loops — two inner loops and one outer loop. As current 
flows through an electrical network, it undergoes increases and decreases in electrical potential, called voltage rises and 
voltage drops, respectively. The behavior of the current at the nodes and around closed loops is governed by two 
fundamental laws: 




Figure 1.8.5 



Kirchhoff s Current Law 



The sum of the currents flowing into any node is equal to the sum of the currents flowing out. 



L 



J 



r 



Kirchhoff s Voltage Law 



In one traversal of any closed loop, the sum of the voltage rises equals the sum of the voltage drops. 



L 



J 



Kirchhoff s current law is a restatement of the principle of flow conservation at a node that was stated for general networks. 
Thus, for example, the currents at the top node in Figure 1.8.6 satisfy the equation = /2 -h I^- 



In circuits with multiple loops and batteries there is usually no way to tell in advance which way the currents are flowing, 
so the usual procedure in circuit analysis is to assign arbitrary directions to the current flows in the branches and let the 
mathematical computations determine whether the assignments are correct. In addition to assigning directions to the 
current flows, Kirchhoff s voltage law requires a direction of travel for each closed loop. The choice is arbitrary, but for 
consistency we will always take this direction to be clockwise (Figure 1.8.7). We also make the following conventions: 

• A voltage drop occurs at a resistor if the direction assigned to the current through the resistor is the same as the direction 
assigned to the loop, and a voltage rise occurs at a resistor if the direction assigned to the current through the resistor is 
the opposite to that assigned to the loop. 

• A voltage rise occurs at a battery if the direction assigned to the loop is from - to + through the battery, and a voltage 
drop occurs at a battery if the direction assigned to the loop is from + to - through the battery. 

If you follow these conventions when calculating currents, then those currents whose directions were assigned correctly 
will have positive values and those whose directions were assigned incorrectly will have negative values. 




Figure 1.8.6 




Clockwise closed-loop 



convention with arbitrary 
direction assignments to 
currents in the branches 



Figure 1.8.7 



EXAMPLE 3 ACircuit with One Closed Loop < 



Determine the current / in the circuit shown in Figure 1.8.8. 




Figure 1.8.8 



Solution Since the direction assigned to the current through the resistor is the same as the direction of the 
loop, there is a voltage drop at the resistor. By Ohm's law this voltage drop is £ = //J = 3/- Also, since the 
direction assigned to the loop is from - to + through the battery, there is a voltage rise of 6 volts at the 
battery. Thus, it follows from Kirchhoff s voltage law that 

31 = 6 

from which we conclude that the current is / = 2 A- Since I is positive, the direction assigned to the current 
flow is correct. 



EXAMPLE 4 A Circuit with Three Closed Loops M 

Determine the currents !{, I2, and in the circuit shown in Figure 1.8.9. 



/i /\ /: 




Figure 1.8.9 



Solution Using the assigned directions for the currents, Kirchhoff s current law provides one equation for 
each node: 

Node Current In Current Out 

A h \ h = h 
B h = h^h 

However, these equations are really the same, since both can be expressed as 



Gustav Kirchhoff (1824-1887) 



Historical Note The German physicist Gustav Kirchhoff was a student of Gauss. His work on 
Kirchhoff s laws, announced in 1854, was a major advance in the calculation of currents, voltages, 
and resistances of electrical circuits. Kirchhoff was severely disabled and spent most of his life on 
crutches or in a wheelchair. 
Image: © SSPL/The Image Works] 



To fmd unique values for the currents we will need two more equations, which we will obtain from 
Kirchhoff s voltage law. We can see from the network diagram that there are three closed loops, a left inner 
loop containing the 50 V battery, a right inner loop containing the 30 V battery, and an outer loop that 
contains both batteries. Thus, Kirchhoff s voltage law will actually produce three equations. With a 
clockwise traversal of the loops, the voltage rises and drops in these loops are as follows: 

Voltage Rises Voltage Drops 



Left Inside Loop 50 + 2O/3 

Right Inside Loop 3O+IO/2 + 2O/3 0 
Outside Loop 30 + 50 + 10/2 

These conditions can be rewritten as 

5/1 +20/3 = 50 

10/2 + 20/3 = -30 (3) 

5/1-10/2 = 80 

However, the last equation is superfluous, since it is the difference of the first two. Thus, if we combine 2 
and the first two equations in 3, we obtain the following linear system of three equations in the three 
unknown currents: 

h = 0 
5/1 +20/3 = 50 
10/2 + 20/3 = -30 

We leave it for you to solve this system and show that / j = 6 A, /2 = — 5 A, and 73 = 1 A. The fact that 1 2 
is negative tells us that the direction of this current is opposite to that indicated in Figure 1.8.9. 



Balancing Chemical Equations 



Chemical compounds are represented by chemical formulas that describe the atomic makeup of their molecules. For 
example, water is composed of two hydrogen atoms and one oxygen atom, so its chemical formula is H2O; and stable 
oxygen is composed of two oxygen atoms, so its chemical formula is O2. 



When chemical compounds are combined under the right conditions, the atoms in their molecules rearrange to form new 
compounds. For example, when methane bums, the methane (CH4) and stable oxygen (O2) react to form carbon dioxide 
(CO2) and water (H2O). This is indicated by the chemical equation 

CH4 + O2 — CO2 H- H2O (4) 

The molecules to the left of the arrow are called the reactants and those to the right the products. In this equation the plus 
signs serve to separate the molecules and are not intended as algebraic operations. However, this equation does not tell the 
whole story, since it fails to account for the proportions of molecules required for a complete reaction (no reactants left 
over). For example, we can see from the right side of 4 that to produce one molecule of carbon dioxide and one molecule 
of water, one needs three oxygen atoms for each carbon atom. However, from the left side of 4 we see that one molecule of 
methane and one molecule of stable oxygen have only two oxygen atoms for each carbon atom. Thus, on the reactant side 
the ratio of methane to stable oxygen cannot be one-to-one in a complete reaction. 

A chemical equation is said to be balanced if for each type of atom in the reaction, the same number of atoms appears on 
each side of the arrow. For example, the balanced version of Equation 4 is 

CH4 + 2O2 CO2 + 2H2O (5) 

by which we mean that one methane molecule combines with two stable oxygen molecules to produce one carbon dioxide 
molecule and two water molecules. In theory, one could multiply this equation through by any positive integer. For 
example, multiplying through by 2 yields the balanced chemical equation 

2CH4 + 4O2 2CO2 + 4H2O 
However, the standard convention is to use the smallest positive integers that will balance the equation. 

Equation 4 is sufficiently simple that it could have been balanced by trial and error, but for more complicated chemical 
equations we will need a systematic method. There are various methods that can be used, but we will give one that uses 
systems of linear equations. To illustrate the method let us reexamine Equation 4. To balance this equation we must find 
positive integers, xj, X2, ^3? and Jt4 such that 

7:1 (CH4) +X2{02) -►:f3(C02) +:f4(H20) (6) 

For each of the atoms in the equation, the number of atoms on the left must be equal to the number of atoms on the right. 
Expressing this in tabular form we have 

Left Side Right Side 



Carbon ^1 = ^3 

Hydrogen '^^1 = 2^:4 

Oxygen 2x2 = 2^:3 + 7:4 

from which we obtain the homogeneous linear system 

XI - X2 =0 
Ax\ -2:^4 = 0 

2^:2-2:^3- 3:4 = 0 

The augmented matrix for this system is 



10-1 0 0 

4 0 0-2 0 
0 2-2-10 

We leave it for you to show that the reduced row echelon form of this matrix is 

10 0-^0 
0 10 -10 
0 0 1-^0 

from which we conclude that the general solution of the system is 

x\=t/2, X2 = t, X2=tl2, X4 = t 

where / is arbitrary. The smallest positive integer values for the unknowns occur when we let ^ = 2? so the equation can be 
balanced by letting ttj = 1, ^2 = 2, 7:3 = 1, X4=2. This agrees with our earlier conclusions, since substituting these 
values into Equation 6 yields Equation 5. 

EXAMPLE 5 Balancing Chemical Equations Using Linear Systems M 

Balance the chemical equation 

HCl -\~ Na3P04 H3PO4 + NaCl 

[hydrochloric acid] + [sodium phosphate] — > [phosphonc acid] 4- [sodium chloride] 

Solution Let x\, x^, ^3, and be positive integers that balance the equation 

X 1 (HCl) + X2 (Na3P04) ^ xr^ (H3PO4) + ;c4(NaCl) (7) 

Equating the number of atoms of each type on the two sides yields 

1 7: 1 = 3?: 3 Hydro gen (H) 
\x\ = \x4 Chlorine (CI) 
37:2 = 1:^4 Sodium(Na) 
1^2 = 1:^3 Phosphorous (P) 
4^2 = 47:3 Oxygen(O) 
from which we obtain the homogeneous linear system 

x\ —1x1 =0 
x\ =7:4 = 0 

37:2 —7:4=0 

4:^2-47:3 =0 

We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is 
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from which we conclude that the general solution of the system is 



x\=t, ^2 = ^/3, X2 = t!3, X4 = t 

where t is arbitrary. To obtain the smallest positive integers that balance the equation, we let ^ = 3? which 
case we obtain ttj = 3, 7:2 = !, ^3=1, and 7:4 = 3. Substituting these values in 7 produces the balanced 
equation 

3HC1 + Na3P04 H3PO4 + 3NaCl 



Polynomial Interpolation 

An important problem in various applications is to fmd a polynomial whose graph passes through a specified set of points 
in the plane; this is called an interpolating polynomial for the points. The simplest example of such a problem is to fmd a 
linear polynomial 

pix)=ax+b (8) 

whose graph passes through two known distinct points, (^x\, y\) and (^X2, 72)' xy-plane (Figure 1.8.10). You have 
probably encountered various methods in analytic geometry for finding the equation of a line through two points, but here 
we will give a method based on linear systems that can be adapted to general polynomial interpolation. 

s 




Figure 1.8.10 



The graph of 8 is the line y =ax =\' b, and for this line to pass through the points (^^i, yi) and (^X2, 72)' ^^^^ 

yi=ax\-^b and 72 = '^^2+* 
Therefore, the unknown coefficients a and b can be obtained by solving the linear system 

axi +6=71 

ax2'\'b=y2 

We don't need any fancy methods to solve this system — the value of a can be obtained by subtracting the equations to 
eliminate b, and then the value of a can be substituted into either equation to find b. We leave it as an exercise for you to 
find a and b and then show that they can be expressed in the form 

provided xi ^xj. Thus, for example, the line y — ax + b that passes through the points 

(2,1) and (5,4) 

can be obtained by taking (^j, ^ j) = (2, 1) and (^2, 72) = (5. which case 9 yields 

a = ±^=l a^d l>= 0)(5)-(4)(2) 

Therefore, the equation of the line is 

y = 7: — 1 



(Figure 1.8.11). 




Figure 1.8.11 



Now let us consider the more general problem of finding a polynomial whose graph passes through n points with distinct 
x-coordinates 

(^h y\)> (^2. 72). (^3. 73), (^«. yn) (10) 
Since there are n conditions to be satisfied, intuition suggests that we should begin by looking for a polynomial of the form 

p{x) =^20 + ^1^ +^2^^ + (11) 

since a polynomial of this form has n coefficients that are at our disposal to satisfy the n conditions. However, we want to 
allow for cases where the points may lie on a line or have some other configuration that would make it possible to use a 
polynomial whose degree is less than ^ — 1 ; thus, we allow for the possibility that ^n-\ and other coefficients in 1 1 may 
be zero. 

The following theorem, which we will prove later in the text, is the basic result on polynomial interpolation. 



THEOREM 1.8.1 Polynomial Interpolation 

Given any n points in the xy-plane that have distinct x-coordinates, there is a unique polynomial of degree n — 1 
or less whose graph passes through those points. 



Let us now consider how we might go about finding the interpolating polynomial 1 1 whose graph passes through the points 
in 10. Since the graph of this polynomial is the graph of the equation 

y = <30 H- 'Sf i:^ + a2X^ + . . . + ay^^\x^~^ (12) 
it follows that the coordinates of the points must satisfy 

+ ^21:^2 + <^2^l + - + ^Yi-\^2~^ =72 (13>) 

In these equations the values of x's and y s are assumed to be known, so we can view this as a linear system in the 
unknowns lafg, ayi-x- From this point of view the augmented matrix for the system is 



1 xj 
1 X2 X2 

1 Xyi Xyi 

and hence the interpolating polynomial can be found by reducing this matrix to reduced row echelon form (Gauss- Jordan 
elimination). 

EXAMPLE 6 Polynomial Interpolation by Gauss-Jordan Elinnination M 

Find a cubic polynomial whose graph passes through the points 

(1,3), (2, -2), (3, -5), (4,0) 

Solution Since there are four points, we will use an interpolating polynomial of degree n = 3- Denote this 
polynomial by 

2 3 

and denote the x- and ^-coordinates of the given points by 

7:1 = 1, 7:2 = 2, 2:3 = 3, 7:4 = 4 and 71 = 3, 72 = - 2, 73 = - 5, 74 = 0 

Thus, it follows from 14 that the augmented matrix for the linear system in the unknowns ag, <2\, a2, and 
is 

1 7:1 xj x^ yi 

1 X2 ^2 ^2 y2 
1 X2 xj X2 73 
1 X4 xj xl 74 

We leave it for you to confirm that the reduced row echelon form of this matrix is 

'1000 4~ 
0 10 0 3 
0 0 10-5 
0 0 0 1 1 

from which it follows that cafg = 4, a\ = 3, 0,2— — 5, ia(3 = 1 . Thus, the interpolating polynomial is 

p(x) =4 -1 Ix-'^x^ I x^ 
The graph of this polynomial and the given points are shown in Figure 1.8.12. 
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Figure 1.8.12 



Remark Later we will give a more efficient method for finding interpolating polynomials that is better suited for 
problems in which the number of data points is large. 

CALCULUS AND CALCULATING UTILITY REQUIRED 

EXAMPLE 7 Approximate Integration M 

There is no way to evaluate the integral 




directly since there is no way to express an antiderivative of the integrand in terms of elementary functions. 
This integral could be approximated by Simpson's rule or some comparable method, but an alternative 
approach is to approximate the integrand by an interpolating polynomial and integrate the approximating 
polynomial. For example, let us consider the five points 

that divide the interval [0, 1] into four equally spaced subintervals. The values of 

/ (x) = sin 

at these points are approximately 

/(0) = 0, /(0.25) = 0.098017, /(0.5) = 0.382683, / (0.75) = 0.77301, /(!) = ! 
The interpolating polynomial is (verify) 

pix) = 0.098796;r 4 0.762356;c^ -f 2. 14429;c^ - 2.00544;^"^ (15) 




p(x)dx ^0A3S50\ (16) 

As shown in Figure 1.8.13, the graphs of f and p match very closely over the interval [0, 1], so the 
approximation is quite good. 
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Figure 1.8.13 



Concept Review 

• Network 

• Branches 

• Nodes 

• Flow conservation 

• Electrical circuits: battery, resistor, poles (positive and negative), electrical potential. Ohm's law, Kirchhoff s 
current law, Kirchhoff s voltage law 

• Chemical equations: reactants, products, balanced equation 

• Interpolating polynomial 

Skills 

• Find the flow rates and directions of flow in branches of a network. 

• Find the amount of current flowing through parts of an electrical circuit. 

• Write a balanced chemical equation for a given chemical reaction. 

• Find an interpolating polynomial for a graph passing through a given collection of points. 



Exercise Set 1 .8 



1. The accompanying figure shows a network in which the flow rate and direction of flow in certain branches are known. 
Find the flow rates and directions of flow in the remaining branches. 

50 




Answer: 




2. The accompanying figure shows known flow rates of hydrocarbons into and out of a network of pipes at an oil refinery, 
(a) Set up a linear system whose solution provides the unknown flow rates. 



(b) Solve the system for the unknown flow rates. 

(c) Find the flow rates and directions of flow if 7:4 = 50 and x^ = 0. 




> 20Q 



Figure Ex-2 

3. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 
rates along the streets are measured as the average number of vehicles per hour. 

(a) Set up a linear system whose solution provides the unknown flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) If the flow along the road from ^ to ^ must be reduced for construction, what is the minimum flow that is required 
to keep traffic flowing on all roads? 
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300 



400 



100 

Figure Ex-3 



750 



250 



200 



300 



Answer: 

(a) 7:2^x4= -500, -7:1+7:4=100, 7:1-7:2 = 300, 7:2-7:3 = 100 
^1^^^ XI = -100-h^, 7:2= -400-f-^, 7:3= -500 4-f, 7:4 = ^ 

(c) For all rates to be nonnegative, we need i — 500 cars per hour, so 7:1 = 400, 7:2 = 100, 7:3 = 0, 7:4 = 500 

4. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 
rates along the streets are measured as the average number of vehicles per hour. 

(a) Set up a linear system whose solution provides the unknown flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) Is it possible to close the road from v4 to ^ for construction and keep traffic flowing on the other streets? Explain. 
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In Exercises 5-8, analyze the given electrical circuits by finding the unknown currents. 



5. 




Answer: 




In Exercises 9-12, write a balanced equation for the given chemical reaction. 

9. C3H8 -H O2 — ► CO2 H- H2O (propane combustion) 
Answer: 

7:1 = 1, X2 = 5, X2 = 3, and 7:4 = 4; the balanced equation is C3H8 + 502 3C( 

10. C6H12O6 CO2 + C2H5OH C fermentation of sugar) 

11. CH3COF + H2O CH3COOH + HF 



4H2O 



Answer: 



xi=X2 = X3 = X4 = t;thQ balanced equation is CH3COF + H2O CH3COOH + HF 

12. CO2 I H2O -> C6H12O6 I O2 ( photosynthesis) 

13. Find the quadratic polynomial whose graph passes through the points (1, 1), (2, 2), and (3, 5). 
Answer: 

p(x) =7:^ — 27: + 2 

14. Find the quadratic polynomial whose graph passes through the points (0, 0), (-1, 1), and (1, 1). 

15. Find the cubic polynomial whose graph passes through the points (-1, -1), (0, 1), (1, 3), (4, -1). 

Answer: 

16. The accompanying figure shows the graph of a cubic polynomial. Find the polynomial. 




1 2 3 4 5 6 7 8 

Figure Ex-16 

(a) Find an equation that represents the family of all second-degree polynomials that pass through the points (0, 1) and 
(1,2). [Hint: The equation will involve one arbitrary parameter that produces the members of the family when 
varied.] 

(b) By hand, or with the help of a graphing utility, sketch four curves in the family. 
Answer: 

(a) Using = A: as a parameter, p(x) = \ kx (\ —k)x where — qo <k< co • 

(b) The graphs for k = 0, 1, 2, and 3 are shown. 




18. In this section we have selected only a few applications of linear systems. Using the Internet as a search tool, try to find 
some more real-world applications of such systems. Select one that is of interest to you, and write a paragraph about it. 



True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) In any network, the sum of the flows out of a node must equal the sum of the flows into a node. 
Answer: 

True 

(b) When a current passes through a resistor, there is an increase in the electrical potential in a circuit. 
Answer: 

False 

(c) Kirchhoff s current law states that the sum of the currents flowing into a node equals the sum of the currents flowing out 
of the node. 

Answer: 

True 

(d) A chemcial equation is called balanced if the total number of atoms on each side of the equation is the same. 
Answer: 

False 

(e) Given any n points in the xy-plane, there is a unique polynomial of degree ^ _ 1 or less whose graph passes through 
those points. 

Answer: 

False 
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1.9 Leontief Input-Output Models 

In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on economic modeling in which he 
used matrix methods to study the relationships between different sectors in an economy. In this section we will discuss 
some of the ideas developed by Leontief 



Inputs and Outputs in an Economy 

One way to analyze an economy is to divide it into sectors and study how the sectors interact with one another. For 
example, a simple economy might be divided into three sectors — manufacturing, agriculture, and utilities. Typically, a 
sector will produce certain outputs but will require inputs from the other sectors and itself For example, the agricultural 
sector may produce wheat as an output but will require inputs of farm machinery from the manufacturing sector, 
electrical power from the utilities sector, and food from its own sector to feed its workers. Thus, we can imagine an 
economy to be a network in which inputs and outputs flow in and out of the sectors; the study of such flows is called 
input-output analysis. Inputs and outputs are commonly measured in monetary units (dollars or millions of dollars, for 
example) but other units of measurement are also possible. 

The flows between sectors of a real economy are not always obvious. For example, in World War II the United States had 
a demand for 50,000 new airplanes that required the construction of many new aluminum manufacturing plants. This 
produced an unexpectedly large demand for certain copper electrical components, which in turn produced a copper 
shortage. The problem was eventually resolved by using silver borrowed from Fort Knox as a copper substitute. In all 
likelihood modem input-output analysis would have anticipated the copper shortage. 

Most sectors of an economy will produce outputs, but there may exist sectors that consume outputs without producing 
anything themselves (the consumer market, for example). Those sectors that do not produce outputs are called open 
sectors. Economies with no open sectors are called closed economies, and economies with one or more open sectors are 
called open economies (Figure 1.9.1). In this section we will be concerned with economies with one open sector, and our 
primary goal will be to determine the output levels that are required for the productive sectors to sustain themselves and 
satisfy the demand of the open sector. 



Manufacturing Agriculture 




Figure 1.9.1 



Leontief Model of an Open Economy 

Let us consider a simple open economy with one open sector and three product-producing sectors: manufacturing, 
agriculture, and utilities. Assume that inputs and outputs are measured in dollars and that the inputs required by the 



productive sectors to produce one dollar's worth of output are in accordance with Table 1 . 



Table 1 





Income Required per Dollar Output 




Manufacturing 


Agriculture 


Utilities 




Manufacturing 


$0.50 


$0.10 


$0.10 


Provider 


Agriculture 


$0.20 


$0.50 


$0.30 




Utilities 


$0.10 


$0.30 


$0.40 




Wassily Leontief (1906-1999) 

Historical Note It is somewhat ironic that it was the Russian-bom Wassily Leontief who won the Nobel prize 
in 1973 for pioneering the modem methods for analyzing free-market economies. Leontief was a precocious 
student who entered the University of Leningrad at age 15. Bothered by the intellectual restrictions of the Soviet 
system, he was put in jail for anti-Communist activities, after which he headed for the University of Berlin, 
receiving his Ph.D. there in 1928. He came to the United States in 1931, where he held professorships at Harvard 
and then New York University. 
[Image: © Bettmann/©Corbis] 



Usually, one would suppress the labeling and express this matrix as 

,'0.5 0.1 0.1 



0.2 
0.1 



0.5 
0.3 



0.3 

0.4 



(1) 



This is called the consumption matrix (or sometimes the technology matrix) for the economy. The column vectors 





"0.5" 




"0 1" 




"0 1" 


ci = 


0.5 


. C2 = 


0.5 


. C3 = 


0.3 




0.1 




0.3 




0.4 



in C list the inputs required by the manufacturing, agricultural, and utilities sectors, respectively, to produce $1.00 worth 
of output. These are called the consumption vectors of the sectors. For example, ci tells us that to produce $1.00 worth of 
output the manufacturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural output, and 
$0.10 worth of utilities output. 



What is the economic significance of the row sums 
of the consumption matrix? 



Continuing with the above example, suppose that the open sector wants the economy to supply it manufactured goods, 
agricultural products, and utilities with dollar values: 

d I dollars of manufactured goods 

dollars of agricultural products 

dollars of utilities 

The column vector d that has these numbers as successive components is called the outside demand vector. Since the 
product-producing sectors consume some of their own output, the dollar value of their output must cover their own needs 
plus the outside demand. Suppose that the dollar values required to do this are 

dollars of manufactured goods 

^2 dollars of agricultural products 

^^3 dollars of utilities 

The column vector x that has these numbers as successive components is called the production vector for the economy. 
For the economy with consumption matrix 1 , that portion of the production vector x that will be consumed by the three 
productive sectors is 

"0.1" 





'0.5" 






'01" 




^1 


0.2 




+ 




0.5 






0.1 






0.3 




Fiactions 






Fiact 


ions 


consumed by 




consumed by 


manufacturing 




agriculture 



Jf3 



0.3 
0.4 



0.5 
0.2 
0.1 



0.1 
0.5 
0.3 



0.1 
0.3 
0.4 



^1 
^3 



Fractions 
consumed 
by utilities 



The vector Cx. is called the intermediate demand vector for the economy. Once the intermediate demand is met, the 
portion of the production that is left to satisfy the outside demand is x — Cx- Thus, if the outside demand vector is d, then 
X must satisfy the equation 



X 




Cx 




d 


Amount 




Intermediate 




Outside 


produced 




demand 




demand 



which we will find convenient to rewrite as 



(1-C)x = d 

The matrix / — C is called the Leontief matrix and 2 is called the Leontief equation. 

EXAMPLE 1 Satisfying Outside Demand A 



(2) 



Consider the economy described in Table 1. Suppose that the open sector has a demand for $7900 worth of 
manufacturing products, $3950 worth of agricultural products, and $1975 worth of utilities. 

(a) Can the economy meet this demand? 

(b) If so, find a production vector x that will meet it exactly. 



Solution The consumption matrix, production vector, and outside demand vector are 





'0.5 0.1 o.r 








'7900" 


c= 


0.2 0.5 0.3 




^2 


, d = 


3950 




0.1 0.3 0.4 




^3 




1975 



(3) 



To meet the outside demand, the vector x must satisfy the Leontief equation 2, so the problem reduces to 
solving the linear system 



0.5 


-0.1 


-0 .r 








"7900' 


-0.2 


0.5 


-0.3 








3950 


-0.1 


-0.3 


0.6 




/3 




1975 




l-C 




X 




d 



(4) 



(if consistent). We leave it for you to show that the reduced row echelon form of the augmented matrix for 
this system is 



1 


0 


0 


27,500" 


0 


1 


0 


33,750 


0 


0 


1 


24,750 



This tells us that 4 is consistent, and the economy can satisfy the demand of the open sector exactly by 
producing $27,500 worth of manufacturing output, $33,750 worth of agricultural output, and $24,750 
worth of utilities output. 



Productive Open Economies 

In the preceding discussion we considered an open economy with three product-producing sectors; the same ideas apply 
to an open economy with n product-producing sectors. In this case, the consumption matrix, production vector, and 
outside demand vector have the form 







^12 • 










'di' 












^2 


, d = 


d2 






^«2 






Xy2 




d» 



where all entries are nonnegative and 

= the monetary value of the output of the zth sector that is needed by the yth sector to produce one unit of output 
^ = the monetary value of the output of the zth sector 
= the monetary value of the output of the ith sector that is required to meet the demand of the open sector 

Remark Note that the Jth column vector of C contains the monetary values that the Jth sector requires of the other 
sectors to produce one monetary unit of output, and the zth row vector of C contains the monetary values required of the 
ith sector by the other sectors for each of them to produce one monetary unit of output. 

As discussed in our example above, a production vector x that meets the demand d of the outside sector must satisfy the 
Leontief equation 

(/-C)x = d 

If the matrix / (7 is invertible, then this equation has the unique solution 

x=(/-C)-^d (5) 

for every demand vector d. However, for x to be a valid production vector it must have nonnegative entries, so the 
problem of importance in economics is to determine conditions under which the Leontief equation has a solution with 
nonnegative entries. 

It is evident from the form of 5 that if / _ (7 is invertible, and if (/ — C) ~^ has non-negative entries, then for every 



demand vector d the corresponding x will also have non-negative entries, and hence will be a valid production vector for 
the economy. Economies for which ( / — C7) ~^ has nonnegative entries are said to be productive. Such economies are 

desirable because demand can always be met by some level of production. The following theorem, whose proof can be 
found in many books on economics, gives conditions under which open economies are productive. 

□ c 



THEOREM 1.9.1 

If C is the consumption matrix for an open economy, and if all of the column sums are less than then the matrix 
I — Cis invertible, the entries of (/ — C) ~^ are nonnegative, and the economy is productive. 



Remark The jth column sum of C represents the total dollar value of input that the Jth sector requires to produce $1 of 
output, so if the yth column sum is less than 1 , then the yth sector requires less than $ 1 of input to produce $ 1 of output; in 
this case we say that the Jth sector is profitable. Thus, Theorem 1.9.1 states that if all product-producing sectors of an 
open economy are profitable, then the economy is productive. In the exercises we will ask you to show that an open 
economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open economy is productive if 
either all of the column sums or all of the row sums of C are less than 1 . 



EXAMPLE 2 An Open Economy Whose Sectors Are All Profitable M 



The column sums of the consumption matrix C in 1 are less than 1, so (/ — C) ^ exists and has nonnegative 
entries. Use a calculating utility to confirm this, and use this inverse to solve Equation 4 in Example 1. 

Solution We leave it for you to show that 

(/-c)-i« 



2.65823 1.13924 1.01266 
1.89873 3.67089 2.15190 
1.39241 2.02532 2.91139 



This matrix has nonnegative entries, and 



x=(/-C)~^d« 


'2.65823 


1.13924 


1.01266' 


"7900' 




"27, 500" 


1.89873 


3.67089 


2.15190 


3950 




33, 750 




1.39241 


2.02532 


2.91139 


1975 




24, 750 



which is consistent with the solution in Example 1 . 



Concept Review 

• Sectors 

• Inputs 

• Outputs 

• Input-output analysis 

• Open sector 

• Economies: open, closed 



• Consumption (technology) matrix 

• Consumption vector 

• Outside demand vector 

• Production vector 

• Intermediate demand vector 

• Leontief matrix 

• Leontief equation 

Skills 

• Construct a consumption matrix for an economy. 

• Understand the relationships among the vectors of a sector of an economy: consumption, outside demand, 
production, and intermediate demand. 



Exercise Set 1.9 

1. An automobile mechanic {M) and a body shop {B) use each other's services. For each $1.00 of business that M does, it 
uses $0.50 of its own services and $0.25 of 5's services, and for each $1.00 of business that B does it uses $0.10 of its 
own services and $0.25 of s services. 

(a) Construct a consumption matrix for this economy. 

(b) How much must M and B each produce to provide customers with $7000 worth of mechanical work and $14,000 
worth of body work? 

Answer: 



(a) 
(b) 



0.50 0.25 
0.25 0.10_ 

$ 25, 290] 

$ 22, 581 



2. A simple economy produces food {F) and housing (//). The production of $1.00 worth of food requires $0.30 worth of 
food and $0. 10 worth of housing, and the production of $1.00 worth of housing requires $0.20 worth of food and 
$0.60 worth of housing. 

(a) Construct a consumption matrix for this economy. 

(b) What dollar value of food and housing must be produced for the economy to provide consumers $130,000 worth 
of food and $130,000 worth of housing? 

3. Consider the open economy described by the accompanying table, where the input is in dollars needed for $1.00 of 
output. 

(a) Find the consumption matrix for the economy. 

(b) Suppose that the open sector has a demand for $1930 worth of housing, $3860 worth of food, and $5790 worth of 
utilities. Use row reduction to find a production vector that will meet this demand exactly. 

Table Ex-3 



Income Required per Dollar Output 



Housing 


Food 


Utilities 




Housing 


$0.10 


$ 0.60 


$ 0.40 


Provider 


Food 


$0.30 


$0.20 


$0.30 




Utilities 


$0.40 


$0.10 


$0.20 



Answer: 



(a) 



(b) 



0.1 0.6 0.4 
0.3 0.2 0.3 
0.4 0.1 0.2 

$31,500 
$ 26, 500 
$ 26, 300 



4. A company produces Web design, software, and networking services. View the company as an open economy 
described by the accompanying table, where input is in dollars needed for $1.00 of output. 

(a) Find the consumption matrix for the company. 

(b) Suppose that the customers (the open sector) have a demand for $5400 worth of Web design, $2700 worth of 
software, and $900 worth of networking. Use row reduction to find a production vector that will meet this demand 
exactly. 

Table Ex-4 





Income Required per Dollar Output 




Web Design 


Software 


Networking 




Web Design 


$0.40 


$0.20 


$0.45 


Provider 


Software 


$0.30 


$0.35 


$0.30 




Networking 


$0.15 


$0.10 


$0.20 



In Exercises 5-6, use matrix inversion to find the production vector x that meets the demand d for the consumption 
matrix C. 



C = 



0.1 0.3 
0.5 0.4 



50 
60 



Answer: 

123.08 
202.56 



C = 



"0.3 01" 




"22" 


0.3 0.7_ 


; d = 


_14_ 



7. Consider an open economy with consumption matrix 



C = 



0 1 



(a) Showthat the economy can meet a demand of tafj = 2 units from the first sector and d2 = 0 units from the second 
sector, but it cannot meet a demand of cj? ^ = 2 units from the first sector and d2 = ^ unit from the second sector. 

(b) Give both a mathematical and an economic explanation of the result in part (a). 
8. Consider an open economy with consumption matrix 



C = 



If the open sector demands the same dollar value from each product-producing sector, which such sector must 
produce the greatest dollar value to meet the demand? 

9. Consider an open economy with consumption matrix 



1 


1 


1 


2 


4 


4 


1 


1 


1 


2 


8 


4 


1 


1 


1 


2 


4 


8 



c= 



C21 0 



Show that the Leontief equation x — Cx = d has a unique solution for every demand vector d if C2iC{2 < 1 " ^11 • 

(a) Consider an open economy with a consumption matrix C whose column sums are less than 1, and let x be the 
production vector that satisfies an outside demand d; that is, (I — C)~^d = x. Let be the demand vector that is 

obtained by increasing the Jth entry of d by 1 and leaving the other entries fixed. Prove that the production vector 
X J that meets this demand is 



= X + ^th column vector of (/ — C) 



-1 



(b) In words, what is the economic significance of the jth column vector of (/ — C) ^ ? [Hint: Look at — ^.] 

11. Prove: If C is an ^ x « matrix whose entries are nonnegative and whose row sums are less than 1, then / _ is 
invertible and has nonnegative entries. [Hint: (a ^ J = (-^ ~^ J ^^Y invertible matrix ^.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) Sectors of an economy that produce outputs are called open sectors. 
Answer: 

False 

(b) A closed economy is an economy that has no open sectors. 
Answer: 

True 

(c) The rows of a consumption matrix represent the outputs in a sector of an economy. 
Answer: 



False 



(d) If the column sums of the consumption matrix are all less than 1, then the Leontif matrix is invertible. 
Answer: 

True 

(e) The Leontif equation relates the production vector for an economy to the outside demand vector. 
Answer: 

True 
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Chapter 1 Supplementary Exercises 



In Exercises 1-4 the given matrix represents an augmented matrix for a linear system. Write the 
corresponding set of linear equations for the system, and use Gaussian elimination to solve the linear system. 
Introduce free parameters as necessary. 



'■[ 



3 -1 
2 0 



4 
3 



1 

-1 



Answer: 



3x1 — ^2 + X4 = \ 

2x1 + + 3x4 = 

3 3 1 9 15 

xi = -^s-^t--^, X2= ~2^~2^~2' ^3 = ^. X4 = t 

1 4 -1 

-2 -8 2 

3 12 -3 

0 0 0 



2-416 
-4 0 3 -1 
0 1-13 



Answer: 

2x1 ~ ^^2 + X3 = 6 
-4x1 + 3x3 = -1 
X2 - X3 = 3 



17 



26 



35 



Jfl= --2". ^2= ^3= -3- 

3 1 -2 
-9 -3 6 

6 2 1 



5. Use Gauss-Jordan elimination to solve forx' andj' in terms of x and_y. 

x-^x 



Answer: 



i 3 A i 4 3 
^ = + 7 = - + J7 

6. Use Gauss- Jordan elimination to solve for x' and j^' in terms of x and 



7. Find positive integers that satisfy 



ttH- 7+ z= 9 
:f H-5j>/ + 10z = 44 



Answer: 



x = 4, y = 2, z = 3 

8. A box containing pennies, nickels, and dimes has 13 coins with a total value of 83 cents. How many coins 
of each type are in the box? 

9. Let 



be the augmented matrix for a linear system. Find for what values of a and b the system has 

(a) a unique solution. 

(b) a one-parameter solution. 

(c) a two-parameter solution. 

(d) no solution. 



10. For which value(s) of a does the following system have zero solutions? One solution? Infinitely many 
solutions? 



a 0 b 2 
a a A A 
0 13 2 



Answer: 



(a) a^\}, b^2 

(b) ^^0, b = 2 

(c) c2 = 0, b = 2 

(d) ^ = 0, b^2 



7:3 = 2 




11. Find a matrix K such that AKB = C given that 

1 4 
A= -2 3 
1 -2 




C 



8 < 
6 - 



6 -6 

1 1 
0 0 



Answer: 



K 



-["] 



12. How should the coefficients a, b, and c be chosen so that the system 

ax + by^3z= —3 
'-2x^by + cz= -1 
jr + 3y — cz = — 3 

has the solution :ir = 1, y = — 1, and j= 2? 

13. In each part, solve the matrix equation for X. 



(a) 

X 

(b) 



-1 0 1 
1 1 0 
3 1 -1 



2 0 
5 



n _i 21 r-5 -1 01 

[3 0 ij [ 6 -3 l\ 



Answer: 



(b) 
(c) 



r-i 3 




[ 6 0 












113 


160 


~ 37 


~ 37 


20 


46 


~37 


~37 



x= 



14. Let A he a square matrix. 

(a) Show that (/ - il) = / + il + + if = 0- 

(b) Show that 

iI-A)~^=I + A + A^+... + A" 

if^«+l=0- 

15. Find values of a, b, and c such that the graph of the polynomial p(x) = ax +bx + c passes through the 
points (1, 2), (-1, 6), and (2, 3). 

Answer: 

fl=l, i= -2, c = 3 

16. (Calculus required) Find values of tz, Z?, and c such that the graph of the polynomial 



2 

p(x) = ax +bx + c passes through the point (-1,0) and has a horizontal tangent at (2, -9). 
17. Let Jn be the » x » niatrix each of whose entries is 1 . Show that if » > 1 , then 



fi-i 

18. Show that if a square matrix A satisfies 

A^ + 4A^-2A + 7I = 0 

then so does j^^. 

19. Prove: If B is invertible, then AB~^ = B~^A if and only if Jig = gj^. 

20. Prove: HA is invertible, then ^4 I B and / | BA ~^ are both invertible or both not invertible. 

21. Prove: If ^ is an ^ x « niatrix and S is the « x 1 matrix each of whose entries is l/n, then 



AB = 



n 



m 



where is the average of the entries in the /th row of A. 
22. (Calculus required) If the entries of the matrix 

Cll(^) ^12(^) 

c= 

are differentiable functions of x, then we define 

c'll(^) c'l2(^) 



■ ■ 

I I 



ci„(t) 
^2mC^) 

■ 



c'2iW c'22(x) 

c'ml(^) C''m2(^) 



(^) 



Show that if the entries in A and B are differentiable functions of x and the sizes of the matrices are such 
that the stated operations can be performed, then 

(c) ^^^-y^dAB^A^ 

aX aX aX 

23. (Calculus required) Use part (c) of Exercise 22 to show that 



dA 



-1 



dx 



dx 



State all the assumptions you make in obtaining this formula. 
24. Assuming that the stated inverses exist, prove the following equalities. 



(b) (/ + C£))-^C = C(/ + Z)C)"^ 
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I CHAPTER I 

7 Determinants 



CHAPTER CONTENTS 



2.3. 



2.1. 



2.2. 



Determinants by Cofactor Expansion 
Evaluating Determinants by Row Reduction 
Properties of Determinants; Cramer's Rule 



INTRODUCTION 



In this chapter we will study "determinants" or, more precisely, "determinant functions." 
Unlike real-valued functions, such as / (x) = x , that assign a real number to a real 

variable x, determinant functions assign a real number / (A) to a matrix variable A. 
Although determinants first arose in the context of solving systems of linear equations, 
they are no longer used for that purpose in real-world applications. Although they can be 
useful for solving very small linear systems (say two or three unknowns), our main 
interest in them stems from the fact that they link together various concepts in linear 
algebra and provide a useful formula for the inverse of a matrix. 
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2.1 Determinants by Cofactor Expansion 



In this section we will define the notion of a "determinant." This will enable us to give a specific formula for the inverse of an 
invertible matrix, whereas up to now we have had only a computational procedure for finding it. This, in turn, will eventually 
provide us with a formula for solutions of certain kinds of linear systems. 



Recall from Theorem 1.4.5 that the 2x2 matrix 



A = 



a b 
c d 



WARNING 

It is important to keep in mind that det(^) is a number, 
whereas ^ is a matrix. 



is invertible if and only i^ad — be ^i^d that the expression ad — be is called the determinant of the matrix A. Recall also 
that this determinant is denoted by writing 



det(-i4) =ad — be or 



a b 
e d 



-ad — be 



and that the inverse of A can be expressed in terms of the determinant as 



1 



det(^) 



d -b 
—c a 



(1) 



(2) 



Minors and Cofactors 



One of our main goals in this chapter is to obtain an analog of Formula 2 that is applicable to square matrices of all orders. For 
this purpose we will find it convenient to use subscripted entries when writing matrices or determinants. Thus, if we denote a 
2x2 matrix as 

"^11 ^12" 
^22 



A = 



then the two equations in 1 take the form 



det(^) = 



an 

^21 ^22 



= ct\\a22~c^n<^2\ 



(3) 



We define the determinant of a 1 x 1 matrix A= [a\\] 
as det [^] = det [<3 1 1 ] =a\\ 



The following definition will be key to our goal of extending the definition of a determinant to higher order matrices. 

r 



DEFINITION 1 



If ^ is a square matrix, then the minor of entry ^ij is denoted by M^j and is defined to be the determinant of the 
submatrix that remains after the /th row and yth column are deleted from ^. The number ( — 1 ) ^ Mij is denoted by 
Cjy and is called the cofactor of entry ^ij. 



L 



J 



EXAMPLE 1 Finding IVIinors and Cofactors M 



Let 



A = 



3 1 -4 
2 5 6 
1 4 8 



WARNING 



We have followed the standard convention of 
using capital letters to denote minors and cofactors 
even though they are numbers, not matrices. 



The minor of entry 1 1 is 



A/ll = 



1 - 4 

5 6 
4 8 



5 6 
4 8 



= 16 



The cofactor ofctn is 

Similarly, the minor of entry £^32 is 



Cii = (-l)^+*Afll = Mli = 16 



A/32 = 



3 
2 


. -4 
5 6 


-4 . 


f 8" 



3 -4 
2 6 



= 26 



The cofactor of is 



C32 = ( - l)^+^M32 = - M32 = - 26 



Historical Note The term determinant was first introduced by the German mathematician Carl Friedrich 
Gauss in 1801 (see p. 15), who used them to "determine" properties of certain kinds of functions. 
Interestingly, the term matrix is derived from a Latin word for "womb" because it was viewed as a container 
of determinants. 



Historical Note The term minor is apparently due to the English mathematician James Sylvester (see p. 
34), who wrote the following in a paper published in 1850: "Now conceive any one line and any one column 
be struck out, we get. . . a square, one term less in breadth and depth than the original square; and by varying 
in every possible selection of the line and column excluded, we obtain, supposing the original square to 
consist of n lines and n columns, ^-r such minor squares, each of which will represent what I term a "First 
Minor Determinant" relative to the principal or complete determinant." 



Remark Note that a minor M^j and its corresponding cofactor Cij are either the same or negatives of each other and that the 



relating sign ( — 1 ) ^ is either | 1 or _ 1 in accordance with the pattern in the "checkerboard" array 



- + - + 
-1- - H- - 

- + - + 
+ - + - 

! : : : 



For example, 

Cii = il/ii, C2i=-M2i, C22 = M22 

and so forth. Thus, it is never really necessary to calculate ( — 1)^"^-' to calculate Cj^ — ^you can simply compute the minor Ik 
and then adjust the sign in accordance with the checkerboard pattern. Try this in Example 1 . 



EXAMPLE 2 Cofactor Expansions of a 2 X 2 Matrix M 

The checkerboard pattern for a 2 x 2 matrix A= [ajj] is 

[- 1] 

so that 

Cii =Mii =a22 Ci2 = -Mu = -<321 

C21 = -M21 = -ai2 C22 = M22=^afii 

We leave it for you to use Formula 3 to verify that det(.'4) can be expressed in terms of cofactors in the following 
four ways: 

det(^) = ^ 

= a\iC\\-^a\2Ci2 
= ^21^21 ^-'^22^22 
= aiiCn +£221^21 
= ^3512^12 -I- ^222^22 

Each of last four equations is called a cofactor expansion of det[^] . In each cofactor expansion the entries and 
cofactors all come from the same row or same column of A. For example, in the first equation the entries and 
cofactors all come from the first row of ^, in the second they all come from the second row of ^, in the third they all 
come from the first column of A, and in the fourth they all come from the second column of A. 



(4) 



Definition of a General Determinant 

Formula 4 is a special case of the following general result, which we will state without proof. 



THEOREM 2.1.1 



If ^ is an « X « matrix, then regardless of which row or column of A is chosen, the number obtained by multiplying the 
entries in that row or column by the corresponding cofactors and adding the resulting products is always the same. 



This result allows us to make the following definition. 

r 



DEFINITION 2 



If ^ is an ^ X « matrix, then the number obtained by multiplying the entries in any row or column of A by the 
corresponding cofactors and adding the resulting products is called the determinant of A, and the sums themselves are 
called cof actor expansions of A. That is, 

det(^) =aijC\j + <^2f2j + - + <^nfyij 
[cof actor expansion along the /th column] 

and 

det(^) = anCii + ai2Ci2 + ... + <3i„Cj„ 
[cof actor expansion along the ith row] 



(5) 



(6) 



EXAMPLE 3 Cofactor Expansion Along the First Row A 

Find the determinant of the matrix 



A = 



by cofactor expansion along the first row. 
Solution 



det(^) = 



3 1 
-2 -4 

5 4 



3 1 
-2 -4 



0 
3 

4 -2 



-4 




- 1 


—2 3 


+ 0 


-2 -4 


4 


-2 


5 -2 


5 4 



= 3 

= 3(-4)-(l)(-ll) + 0= -1 



EXAMPLE 4 Cofactor Expansion Along the First Column A 

Let A be the matrix in Example 3, and evaluate det(^) by cofactor expansion along the first column of ^4. 
Solution 



det(^) = 



3 1 0 
-2 -4 3 
5 4-2 



= 3 



-4 3 
4 -2 



-C-2) 



1 0 

4 -2 



I 5 



1 0 

-4 3 



= 3(-4)-(-2)(-2) + 5(3)= -1 



Note that in Example 4 we had to compute three 
cofactors, whereas in Example 3 only two were 
needed because the third was multiplied by zero. 
As a rule, the best strategy for cofactor 
expansion is to expand along a row or column 
with the most zeros. 



This agrees with the result obtained in Example 3. 




Charles Lutwidge Dodgson (Lewis Carroll) (1832-1898) 

Historical Note Cofactor expansion is not the only method for expressing the determinant of a matrix 
in terms of determinants of lower order. For example, although it is not well known, the English 
mathematician Charles Dodgson, who was the author of Alice's Adventures in Wonderland and Through 
the Looking Glass under the pen name of Lewis Carroll, invented such a method, called "condensation'' 
That method has recently been resurrected from obscurity because of its suitability for parallel 
processing on computers. 

[Image: Time & Life Pictures/Getty Images, Inc.] 



EXAMPLES Smart Choice of Row or Column M 



If A is the 4 x 4 matrix 



.4 = 



10 0-1 

3 12 2 

10-2 1 

2 0 0 1 



then to find det(.'4) it will be easiest to use cofactor expansion along the second column, since it has the most zeros: 

1 0 -1 



det(^) = 1 



1 -2 1 

2 0 1 



For the 3x3 determinant, it will be easiest to use cofactor expansion along its second column, since it has the most 
zeros: 

1 -1 

2 1 

= -2(1 + 2) 



det(^) = 1 • - 2 



= -6 



EXAMPLE 6 Determinant of an Upper Triangular Matrix M 



The following computation shows that the determinant of a 4 x 4 upper triangular matrix is the product of its 
diagonal entries. Each part of the computation uses a cofactor expansion along the first row. 



^11 


0 


0 


0 






0 


0 


a3i 






0 


a4\ 


^42 


a43 


a44 



■-an 



^22 



0 0 

<^33 ^ 
a42 a42 a44 



ai3 0 
a42 a44 

= ana22a32\a44\=ai \a22a33a44 



The method illustrated in Example 6 can be easily adapted to prove the following general result. 



THEOREM 2.1.2 

If A is an x n triangular matrix (upper triangular, lower triangular, or diagonal), then det(-d) is the product of the 
entries on the main diagonal of the matrix; that is, dtt{A) =ci\\ci22 ' ' ' ^nn- 



A Useful Technique for Evaluating 2x2 and 3x3 Determinants 

Determinants of 2 x 2 and 3x3 matrices can be evaluated very efficiently using the pattern suggested in Figure 2.1.1. 





Figure 2.1.1 

In the 2 x 2 case, the determinant can be computed by forming the product of the entries on the rightward arrow and 
subtracting the product of the entries on the leftward arrow. In the 3 x 3 case we first recopy the first and second columns as 
shown in the figure, after which we can compute the determinant by summing the products of the entries on the rightward 
arrows and subtracting the products on the leftward arrows. These procedures execute the computations 

WARNING 



The arrow technique only works for determinants of 
2x2 and 3x3 matrices. 



an ax2 
<^2\ <^22 



= a\\a22-ct\2<^2\ 



a\\ a\2 a\2 
<^2\ ^22 ^23 
a2\ ^32 <»33 





<^22 


^23 




^2\ 


^3^23 




<^2\ 


<^22 


an 


^32 


^33 


-^\2 


^31 


^33 


1 ai3 


^31 


^32 



= a\\{a22^33 - ^223^232) - ^12(^21^33 - ^23^3l) + <^\3i<^2\^32 - <^22<^3\) 
= a\\a22^33 ♦ '^12'^23'^31 i i^i3'221'^32 — '^n'^ 22^^31 — '^12'^21'^33 — '^ll'^23'^32 



which agrees with the cofactor expansions along the first row. 

EXAMPLE 7 A Technique for Evaluating 2x2 and 3x3 Determinants 




= [45 + 84 + 961 - 1105-48-72] = 240 



Concept Review 

• Determinant 

• Minor 

• Cofactor 

• Cofactor expansion 

Skills 

• Find the minors and cofactors of a square matrix. 

• Use cofactor expansion to evaluate the determinant of a square matrix. 

• Use the arrow technique to evaluate the determinant ofa2x2or3x3 matrix. 

• Use the determinant of a 2 x 2 invertible matrix to find the inverse of that matrix. 

• Find the determinant of an upper triangular, lower triangular, or diagonal matrix by inspection. 



Exercise Set 2.1 



In Exercises 1-2, find all the minors and cofactors of the matrix^. 



1. 



A = 



1 -2 3 
6 7-1 

-3 1 4 



Answer: 



Mil = 29, ^^11 = 29 
Mi2 = 21. Ci2= -21 
Mi3 = 27, Ci3 = 27 
M2i= -11. C2i = ll 
A^22=13, C22=13 
M2i=-5, C23 = 5 
M2\ = - 19. C31 = - 19 

^32= -19, C32=19 
M33= 19, C33= 19 

1 1 2' 



A = 



3. Let 



3 3 6 
0 1 4 



A = 



A 


-1 


1 


6 


0 


0 


-3 


3 


4 


1 


0 


14 


4 


1 


3 


2 



Find 

(a) Mi3andCi3. 

(b) M23 and C23 . 

(c) M22andC22. 

(d) M21 and C21 . 

Answer: 

(a) Mi3 = 0, Ci3 = 0 

(b) ^23= -96. C23 = 96 

(c) ^22= -48. C22= -48 

(d) M2i=72. C2i=-72 



4. Let 



i4 = 



2 3-11 

-3 2 0 3 

3-2 10 

3-2 14 



Find 

(a) M32andC32. 

(b) M44andC44. 

(c) M41 and C41 . 

(d) M^mdC2A- 

In Exercises 5-8, evaluate the determinant of the given matrix. If the matrix is invertible, use Equation 2 to find its inverse. 
5. 



Answer: 



22; 



_2_ 
11 

_L 

11 



_5_ 
"22 
3_ 
22 



Is I] 



Answer: 



59; 



_2_ 
■59 

J_ 
59 



1_ 
■59 

5_ 
■59 



8. 



4 {i 



In Exercises 9-14, use the arrow technique to evaluate the determinant of the given matrix. 

9. \a-3 5 1 
L-3 a-2\ 

Answer: 

a^- 5a -I- 21 

10. [-2 7 6 
5 1 -2 

3 8 4 

11. -2 1 4 
3 5 -7 
1 6 2 

Answer: 
-65 

12. 



13. 



-1 


1 


2 


3 


0 


-5 


1 


7 


2 


3 


0 


0 


2 - 


-1 


5 


1 


9 


-4 



Answer: 
-123 



14. 



c -4 3 
2 1 

4 c-1 2 



In Exercises 15-18, find all values of ^ for which det(j4) = 0. 



^-[_5 A + 4J 



Answer: 
A=l or -3 



16. 



A = 



A-4 0 0 
0 A 2 
0 3 A-1 



17.._rA-l 0 1 
^"L 2 A+lJ 



Answer: 
A=lor -1 



18. 



\-4 4 0 
-1 A 0 
0 0 A-5 



19. Evaluate the determinant of the matrix in Exercise 13 by a cofactor expansion along 

(a) the first row. 

(b) the first column. 

(c) the second row. 

(d) the second column. 

(e) the third row. 

(f) the third column. 

Answer: 
(a]lparts)-123 

20. Evaluate the determinant of the matrix in Exercise 12 by a cofactor expansion along 

(a) the first row. 

(b) the first column. 

(c) the second row. 

(d) the second column. 

(e) the third row. 

(f) the third column. 

In Exercises 21-26, evaluate det(-d) by a cofactor expansion along a row or column of your choice. 



21. 



i4 = 



2 

-1 



0 7 
5 1 
0 5 



Answer: 



-40 



22. 



i4 = 



3 3 
1 0 
1 -3 



23. 



A = 



1 k 
1 k k^ 
1 k k^ 



Answer: 



24. 



A = 



25. 



A = 



k+l k-\ 7 

2 k-3 4 

5 *+l jt 

3 3 0 5 
2 2 0 -2 

4 1-30 
2 10 3 2 



Answer: 



-240 



26. 



4 


0 


0 


1 


0 


3 


3 


3 


-1 


0 


1 


2 


4 


2 


3 


9 


4 


6 


2 


3 


2 


2 


4 


2 


3 



i4 = 



In Exercises 27-32, evaluate the determinant of the given matrix by inspection. 



27. 



1 0 0 
0-10 
0 0 1 



Answer: 





-1 








28. 


'2 


0 


0" 






0 


2 


0 






0 


0 


2_ 




29. 


"0 


0 


0 


0 




1 


2 


0 


0 




0 


4 


3 


0 




1 


2 


3 


8 



Answer: 



30. 


"1 


1 


1 r 






0 


2 


2 2 






0 


0 


3 3 






0 


0 


0 4 




31. 


1 


2 


7 


-3 




0 


1 


-4 


1 




0 


0 


2 


7 




0 


0 


0 


3 



Answer: 



-3 0 


0 


0 


1 2 


0 


0 


40 10 


-1 


0 


100 200 


-23 


3 



32. 



33. Show that the value of the following determinant is independent of 6. 

sm(0) cos(0) 0 

-cos (6) sm(9) 0 

sin(fl) -cos(fl) sm(fl) + cos(fl) 1 

Answer: 

The determinant is sin^fl + cos^fl = 1 • 



34. Show that the matrices 



commute if and only if 



'=[o :] 



and £ = 



0 / 



\b a — c 
\e d-f 



= 0 



35. By inspection, what is the relationship between the following determinants? 





a 


b 


c 




a-\-\ b 


C 




d 


1 


/ 


and d2 = 


d 1 


f 




g 


0 


1 




g 0 


1 



Answer: 
36. Show that 



det(.<4) = 



tr(^) 1 



for every 2x2 matrix A. 

37. What can you say about an «th-order determinant all of whose entries are 1? Explain your reasoning. 

38. What is the maximum number of zeros that a 3 x 3 matrix can have without having a zero determinant? Explain your 
reasoning. 

39. What is the maximum number of zeros that a 4 x 4 matrix can have without having a zero determinant? Explain your 
reasoning. 

40. Prove that {x\,y\), {x2, 72)' (^3» 73) collinear points if and only if 

^1 y\ 1 
^2 yi 1 =0 
^3 73 1 



41. Prove that the equation of the line through the distinct points {a\,b\) and {a2, i?2) written as 



X 


y 


1 








1 




«2 


h 


1 





42. Prove that if A is upper triangular and B^j is the matrix that results when the ith row and yth column of A are deleted, then 
Bjj is upper triangular if i <. J. 

True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer, 
(a), 



The determinant of the 9 v ? matrix , i^ad \ be- 

\_c a j 

Answer: 

False 

(b) Two square matrices A and B can have the same determinant only if they are the same size. 
Answer: 

False 

(c) The minor Mjj is the same as the cofactor C^j if and only if j | J is even. 
Answer: 

True 

(d) If ^ is a 3 X 3 symmetric matrix, then Cjj = Cp for all / and j. 
Answer: 

True 

(e) The value of a cofactor expansion of a matrix A is independent of the row or column chosen for the expansion. 
Answer: 

True 

(f) The determinant of a lower triangular matrix is the sum of the entries along its main diagonal. 
Answer: 

False 

(g) For every square matrix A and every scalar c, we have det(cA) = c det(A) . 
Answer: 

False 

(h) For all square matrices A and B, we have dtt(A ^ B) = det(A) + det(5) . 
Answer: 



False 



(i) For every 2x2 matrix A, we have det(j4 ) = (det(-4)) . 
Answer: 

True 
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2.2 Evaluating Determinants by Row Reduction 



In this section we will show how to evaluate a determinant by reducing the associated matrix to row echelon form. In 
general, this method requires less computation than cofactor expansion and hence is the method of choice for large 
matrices. 



A Basic Theorem 

We begin with a fundamental theorem that will lead us to an efficient procedure for evaluating the determinant of a square 
matrix of any size. 

THEOREM 2.2.1 

Let ^ be a square matrix. If A has a row of zeros or a column of zeros, then det(^) = 0. 

n n 

Proof Since the determinant of A can be found by a cofactor expansion along any row or column, we can use the row or 
column of zeros. Thus, if we let Ci , C2, . . C„ denote the cofactors of A along that row or column, then it follows from 
Formula 5 or 6 in Section 2.1 that 

det(^) = 0 • Ci 4 0 • C2 + ..- + O • C„ = 0 

The following useful theorem relates the determinant of a matrix and the determinant of its transpose. 



THEOREM 2.2.2 

Let^ be a square matrix. Then det(A) = det(^ ). 

o □ 

Because transposing a matrix changes its columns to 
rows and its rows to columns, almost every theorem 
about the rows of a determinant has a companion 
version about columns, and vice versa. 

Proof Since transposing a matrix changes its columns to rows and its rows to columns, the cofactor expansion of A 

T 

along any row is the same as the cofactor expansion of A along the corresponding column. Thus, both have the same 
determinant. 



Elementary Row Operations 



The next theorem shows how an elementary row operation on a square matrix affects the value of its determinant. In 



place of a formal proof we have provided a table to illustrate the ideas in the 3 x 3 case (see Table 1). 

THEOREM 2.2.3 

Let ^ be an ^ X « matrix. 

(a) If B is the matrix that results when a single row or single column of A is multiplied by a scalar k, then 
det(5) =kdtt{A). 

(b) If B is the matrix that results when two rows or two columns of A are interchanged, then det(5) = — det(-4) . 

(c) liB is the matrix that results when a multiple of one row of A is added to another row or when a multiple of 
one column is added to another column, then det(5) = <itt{A) . 

The first panel of Table 1 shows that you can bring a 
common factor from any row (column) of a 
determinant through the determinant sign. This is a 
slightly different way of thinking about part {a) of 
Theorem 2.2.3. 



Table 1 



Relationship 


Operation 


ka^^ At/ 12 ka^^^ 

ilj, ^23 = k 
(h\ ^32 ^'33 


(hi 

^21 ^'22 ^'23 
(I3, f/33 

A) 


The first row of A is 
multiplied by k. 


^21 ^22 ^2y ^11 ^12 

^11 ^12 ^^13 = — ^21 ^22 ^23 

^31 ^32 ^33 «31 ^32 ^33 

det(/?) = -det(/4) 


The first and second rows 
of A are interchanged. 




^ll"*"*^21 ^12^"*^22 ^13"*"*^23 
^21 ^^22 ^'23 
«3I «32 «33 

det(/?) = det( 


4) 


a,, ti,2 a,3 

Cl2l t^22 ^^23 

^^32 «33 




A multiple of the second 
row of A is added to the 
first row. 



We will verify the first equation in Table 1 and leave the other two for you. To start, note that the determinants on the two 
sides of the equation differ only in the first row, so these determinants have the same cofactors, Cn, C12, C13, along that 
row (since those cofactors depend only on the entries in the second two rows). Thus, expanding the left side by cofactors 
along the first row yields 



ka\\ ka\2 kai2 
^22 ^23 
a^i ^33 



= kauCu \ kauCn-^ ^^^33^13 

= k(anCn +(ati2Ci2 + t333Ci3) 

an ayi ayi 
= k<^2\ ^22 ^23 
^31 <^32 ^33 



Elementary Matrices 

It will be useful to consider the special case of Theorem 2.2.3 in which -4 = /„ is the « x « identity matrix and E (rather 
than B) denotes the elementary matrix that results when the row operation is performed on In this special case 
Theorem 2.2.3 implies the following result. 

m 

THEOREM 2.2.4 

Let EhQan^xn elementary matrix. 

(a) If E results from multiplying a row of by a nonzero number k, then det(£') = k. 

(b) If E results from interchanging two rows of then det(£) = — 1 . 

(c) If E results from adding a multiple of one row of to another, then det(£') = 1 . 

EXAMPLE 1 Determinants of Elementary Matrices A 

The following determinants of elementary matrices, which are evaluated by inspection, illustrate Theorem 
2.2.4. 



Observe that the determinant of an elementary 
matrix cannot be zero. 



1 


0 


0 


0 




0 


0 


0 


1 


0 


3 


0 


0 


= 3, 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 




1 


0 


0 


0 



10 0 7 
0 10 0 
0 0 10 
0 0 0 1 



= 1 



The second row of /4 The fii^t and last rows of 7 times the last row of /4 
was mult^lied by 3 . ^4 were interchanged. ^as added to the first row. 



Matrices witti Proportional Rows or Columns 

If a square matrix A has two proportional rows, then a row of zeros can be introduced by adding a suitable multiple of one 



of the rows to the other. Similarly for columns. But adding a multiple of one row or column to another does not change 
the determinant, so from Theorem 2.2.1, we must have det(^) = 0. This proves the following theorem. 



THEOREM 2.2.5 



If ^ is a square matrix with two proportional rows or two proportional columns, then det(^) = 0. 

CI! 



EXAMPLE 2 Introducing Zero Rows M 



The following computation shows how to introduce a row of zeros when there are two proportional rows. 

The second row is 2 times the 
first, so we added —2 times 

0 — 

the first row to the second to 
introduce a row of zeros . 



1 


3 


-2 


4 




1 


3 


-2 


4 


2 


6 


-4 


8 




0 


0 


0 


0 


3 


9 


1 


5 




3 


9 


1 


5 


1 


1 


4 


8 




1 


1 


4 


8 



= 0 



Each of the following matrices has two proportional rows or columns; thus, each has a determinant of zero. 



-1 

-2 



1 

-4 



-2 

8 
-4 



3 


-1 


4 


-5 


6 


-2 


5 


2 


5 


8 


1 


4 


-9 


3 


-12 


15 



Evaluating Determinants by Row Reduction 



We will now give a method for evaluating determinants that involves substantially less computation than cofactor 
expansion. The idea of the method is to reduce the given matrix to upper triangular form by elementary row operations, 
then compute the determinant of the upper triangular matrix (an easy computation), and then relate that determinant to 
that of the original matrix. Here is an example. 

EXAMPLE 3 Using Row Reduction to Evaluate a Determinant M 



Evaluate det(^) where 



A = 



"0 1 5" 
3-6 9 
2 6 1 



Solution We will reduce A to row echelon form (which is upper triangular) and then apply Theorem 
2.1.2. 



Even with today's fastest computers it would 
take millions of years to calculate a 25 x 25 
determinant by cofactor expansion, so 



methods based on row reduction are often 
used for large determinants. For determinants 
of small size (such as those in this text), 
cofactor expansion is often a reasonable 
choice. 



det(^) = 



0 


1 


5 






3 -6 


9 


3 


-6 


9 






0 1 


5 


2 


6 


1 






2 6 


1 












1 -2 


3 










3 


0 1 


5 












2 6 


1 










1 


-2 


3 








-3 


0 


1 


5 










0 


10 - 


-5 



1 -2 3 
0 1 5 
0 0 -55 



= (-3)(-55) 



1 -2 3 
0 1 5 
0 0 1 



The first and second rows of 
^ where interchanged . 

A common factor of 3 from 
the first row was taken 
through the determinant sign . 

=2 times the first row was 
added to the third row . 

—10 times the second row 
was added to the third row . 

A common factor of —55 
• from the last rowwas taken 
through the determinant sign . 



= (-3)(-55)(l) = 165 



EXAMPLE 4 Using Column Operations to Evaluate a Determinant 

Compute the determinant of 



A = 



10 0 3 

2 7 0 6 

0 6 3 0 
7 3 1-5 



Solution This determinant could be computed as above by using elementary row operations to reduce A to 
row echelon form, but we can put A in lower triangular form in one step by adding -3 times the first column 
to the fourth to obtain 



det(^) = det 



1 


0 


0 


0 


2 


7 


0 


0 


0 


6 


3 


0 


7 


3 


1 


-26 



= (1) (7) (3) (-26)= -546 



Example 4 points out that it is always wise to keep 
an eye open for column operations that can shorten 



computations. 



Cofactor expansion and row or column operations can sometimes be used in combination to provide an effective method 
for evaluating determinants. The following example illustrates this idea. 

EXAMPLES Row Operations and Cofactor Expansion A 



Evaluate det(^) where 



A = 



3 5 

1 2 

2 4 

3 7 



-2 6 
-1 1 

1 5 
5 3 



Solution By adding suitable multiples of the second row to the remaining rows, we obtain 

3 

1 n 1 

det(^) = 



' Cofactor expansion along the first column . 





0 


-1 


1 




1 


2 


-1 




0 


0 


3 




0 


1 


8 






-1 1 3 






0 3 3 






1 I 


I 0 






-1 1 3 






0 3 3 






0 9 3 






(-1) 


3 3 






9 3 






18 





- We added the first row to the third row . 



Cofactor expansion along the first colmnn . 



Skills 

• Know the effect of elementary row operations on the value of a determinant. 

• Know the determinants of the three types of elementary matrices. 

• Know how to introduce zeros into the rows or columns of a matrix to facilitate the evaluation of its determinant. 

• Use row reduction to evaluate the determinant of a matrix. 

• Use column operations to evaluate the determinant of a matrix. 

• Combine the use of row reduction and cofactor expansion to evaluate the determinant of a matrix. 



Exercise Set 2.2 



In Exercises 1-4, verify that det(-4) = det(-4 ). 



'=[1 -3] 



3. 



A = 



A = 



2 -1 3 
1 2 4 
5 -3 6 

4 2-1 

0 2-3 
-1 1 5 



In Exercises 5-9, find the determinant of the given elementary matrix by inspection. 



1 0 


0 


0 


0 1 


0 


0 


0 0-5 


0 


0 0 


0 


1 


Answer: 







-5 



6. 



1 0 0 
0 1 0 
-5 0 1 

10 0 0 

0 0 10 
0 10 0 
0 0 0 1 

Answer: 



-1 

1 

0 

0 
0 



0 0 

-lo 



0 1 0 
0 0 1 



9. 1 0 0 0 
0 10-9 

0 0 1 0 

0 0 0 1 

Answer: 



1 

In Exercises 10-17, evaluate the determinant of the given matrix by reducing the matrix to row echelon form. 

10. 3 6 -9 

0 0-2 
-2 1 5 



11. 



0 3 1 

1 1 2 
3 2 4 



Answer: 



12. 


1 


-3 


0" 




-2 


4 


1 




5 


-2 


2 


13. 


3 


-6 


9 




-2 


7 


_2 




0 


1 


5 



Answer: 



33 



14. 



15. 



1 -2 

5 -9 

-1 2 

2 8 

2 13 1 
10 11 
0 2 10 
0 12 3 

Answer: 



-6 
6 



-2 
1 



16. 



1 1 1 
1 



1 1 1 

2 2 

111 

3 3 3 

-i 2 0 

3 3 

17. 1 3 1 5 3 
-2-7 0-4 2 

0 0 1 0 1 

0 0 2 1 1 

0 0 0 1 1 

Answer: 
-2 

18. Repeat Exercises 10-13 by using a combination of row reduction and cofactor expansion. 

19. Repeat Exercises 14—17 by using a combination of row operations and cofactor expansion. 

Answer: 

Exercise 14: 39; Exercise 15: 6; Exercise 16: — i; Exercise 17: —2 

6 



In Exercises 20-27, evaluate the determinant, given that 



20. 



21. 



g h i 

d e f 

a b c 

d e J 

g A i 

a b c 

Answer: 
-6 

12. a b c 
d e f 

2a 2b 2c 

23. Za 3b 3c 
-rf -5 -/ 

4g 4k 4i 

Answer: 

72 

a-^d b ~\-e c 
-d -/ 
g A i 

a-\-g b + h c^i 
d e f 
g A j 

Answer: 



24. 



25. 



-6 

26. a b c 

2d 2e 2/ 

g \ 3a h-\-3b j + 3c 

27. -3l2 -3A -3c: 

^ e f 

g — 4<af A — 4e i — 4/ 



Answer: 



18 

28. Show that 



(a) 



det 



0 

^22 ^23 



= 13^222^31 



(b) 



det 



a2\ ai2 ^33 

0 0 0 ai4 

0 0 a23 ^24 

0 a22 ^33 ^34 

<^41 1^42 "^43 <^A4 



29. 



Use row reduction to show that 



'■^\^23^32<^A\ 



1 1 1 

a b c 



= {b-a){c^a){c^b) 

In Exercises 30-33, confirm the identities without evaluating the determinants directly. 



2 l2 2 
a b c 



30. 


a\ { b\t a2-¥b2t a^-^-b^t 










a2 «3 




a\t V b\ a2t + ^2 ^ 


3t^b2 












C2 


^3 












31. 


a\ b\ a\-^b\+c\ 




a\ 














a2 b2 a2^b2^C2 




^2 


h 


C2 










a2 63 +63+^3 






h 


C3 








32. 


a\ bi^^tai c\=hrbi=^sa\ 




^1 




02 






a2 b2^ta2 C2^rb2^sa2 




il 


hi 


b2 






a2 63+^^23 £73 + /"i3 + 5<33 






C2 


C2 




33. 


a\~{-b\ a\—b\ c\ 








61 


ci 








a2^b2 a2-b2 C2 




-2 


^2 


h2 










^23-hi3 ^23 — ^3 C2, 






as 


b3 


^3 







34. Find the determinant of the following matrix. 



a 


b 


b 


b 


b 


a 


b 


b 


b 


b 


a 


b 


b 


b 


b 


a 



In Exercises 35-36, show that det(-4) = 0 without directly evaluating the determinant. 



35. 


-2 


8 


1 


4 








3 


2 


5 


1 






A = 














1 


10 


6 


5 








4 


-6 


4 ■ 


-3 






36. 


"-4 


1 


1 




1 


1 




1 


-4 


1 




1 


1 


A = 


1 


1 


-4 




1 


1 




1 


1 


1 




4 


1 




1 


1 


1 




1 


-4 



True-False Exercises 



In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 



(a) If ^ is a 4 X 4 matrix and B is obtained from A by interchanging the first two rows and then interchanging the last two 
rows, then det(5) = det(-4). 

Answer: 

True 

(b) If ^ is a 3 X 3 matrix and B is obtained from A by multiplying the first column by 4 and multiplying the third column 
by |-, then det(5) = 3 det(^). 

Answer: 

True 

(c) If ^ is a 3 X 3 matrix and B is obtained from A by adding 5 times the first row to each of the second and third rows, 
thendet(5) = 25det(^). 

Answer: 

False 

(d) If ^ is an « X ;2 matrix and B is obtained from A by multiplying each row of A by its row number, then 

Answer: 

False 

(e) If ^ is a square matrix with two identical columns, then det(j4) = 0. 
Answer: 

True 

(f) If the sum of the second and fourth row vectors of a 6 x 6 matrix A is equal to the last row vector, then det(-4) = 0. 
Answer: 

True 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



2.3 Properties of Determinants; Cramer's Rule 

In this section we will develop some fundamental properties of matrices, and we will use these results to derive a 
formula for the inverse of an invertible matrix and formulas for the solutions of certain kinds of linear systems. 



Basic Properties of Determinants 



Suppose that A and ^ are « x « matrices and k is any scalar. We begin by considering possible relationships 
between dtt(A), det(5), and 

det(kA) , det(^ + 5) , and det(^) 

Since a common factor of any row of a matrix can be moved through the determinant sign, and since each of the 
n rows in has a common factor of it follows that 



det(kA)=k''det{A) 



(1) 



For example. 



kail ^^\2 ^^13 
ka2i ka22 ka22 
ka3i ka22 ka22 



= k' 



an 


<»12 


<atl3 








«31 


«32 


«33 



Unfortunately, no simple relationship exists among det(.'4), det(5), and det(.'4 I 5). In particular, we emphasize 
that det(-j4 I B) will usually not be equal to det(A) } det(5) . The following example illustrates this fact. 

EXAMPLE 1 det(/\ + B) 5^ det(/\) + det(B) < 

Consider 

A = 



"1 2" 


, B = 


'3 r 


, A + B = 


'4 3' 


3 8_ 


2 5_ 




1 3_ 





We have det(^) = 1 , det(5) = 8, and det(^ + 5) = 23; thus 

det(^ + B)^ det(^) + det(5) 



In spite of the previous example, there is a useful relationship concerning sums of determinants that is applicable 
when the matrices involved are the same except for one row (column). For example, consider the following two 
matrices that differ only in the second row: 



A = 



ail 
«21 



<atl2 
«22 



and 5 = 



ai2 

hi h2 



Calculating the determinants of A and B we obtain 



det(^) + det(5) = (a\\a22-^\2^2\) + (^11*22 12^21 ) 

^11 ^12 
^2\+^2l ^22 + ^ 



= det 



Thus 



det 



an ayi 
^21 ^^22 



+ det 



H\ i>22 



= det 



«11 <^\2 
(321+621 "^22" 



This is a special case of the following general result. 



THEOREM 2.3.1 



Let A, B, and C be « x « matrices that differ only in a single row, say the rth, and assume that the rth row 
of C can be obtained by adding corresponding entries in the rth rows of A and B. Then 

det(C) = det(^) + det(5) 

The same result holds for columns. 



EXAMPLE 2 Sums of Determinants < 

We leave it to you to confirm the following equality by evaluating the determinants. 





1 


7 


5 




'1 


7 


5" 




'1 


7 


5' 


det 


2 


0 


3 


= det 


2 


0 


3 


+ det 


2 


0 


3 




1 + 0 


4 + 1 


7 + (-l) 




1 


4 


7 




0 


1 


-1 



Determinant of a Matrix Product 

Considering the complexity of the formulas for determinants and matrix multiplication, it would seem unlikely 
that a simple relationship should exist between them. This is what makes the simplicity of our next result so 
surprising. We will show that if A and B are square matrices of the same size, then 

det(^) = det(^) det(5) (2) 

The proof of this theorem is fairly intricate, so we will have to develop some preliminary results first. We begin 
with the special case of 2 in which A is an elementary matrix. Because this special case is only a prelude to 2, we 
call it a lemma. 



LEMMA 2.3.2 



If ^ is an « X « matrix and E ism^xn elementary matrix, then 

det(EB) = dtt(E) det(B) 

iii □ 

Proof We will consider three cases, each in accordance with the row operation that produces the matrix E. 

Case llfE results from multiplying a row of /vj by then by Theorem 1.5.1, SB results from B by multiplying 
the corresponding row by k; so from Theorem 2.23(a) we have 

det(BB)=kdet{B) 
But from Theorem 2.2.4(a) we have det(E) = k, so 

det(EB) = det(E) det(B) 

Case 2 and 3 The proofs of the cases where E results from interchanging two rows of or from adding a 
multiple of one row to another follow the same pattern as Case 1 and are left as exercises. 

Remark It follows by repeated applications of Lemma 2.3.2 that if ^ is an ^ x « matrix and E\, E2, Ey are 
« X « elementary matrices, then 

det(EiE2--£r^) = det(5i) det(5'2).--det(£'y)det(5) (3) 



Determinant Test for Invertibility 

Our next theorem provides an important criterion for determining whether a matrix is invertible. It also takes us a 
step closer to establishing Formula 2. 

n a 
THEOREM 2.3.3 

A square matrix A is invertible if and only if det(^) ^ 0. 

B U 

Proof Let R be the reduced row echelon form of A. As a preliminary step, we will show that dtt(A) and det(iR) 
are both zero or both nonzero: Let £1, ^2, 5^ be the elementary matrices that correspond to the elementary 
row operations that produce R from ^. Thus 



and from 3, 



det(/?) = det(5y) • • • det(B2) det(£i) det(^) 



(4) 



We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant of an elementary matrix 
is nonzero. Thus, it follows from Formula 4 that dtt(A) and det(^) are either both zero or both nonzero, which 
sets the stage for the main part of the proof. If we assume first that A is invertible, then it follows from Theorem 
1.6.4 that = J and hence that dtt(R) = \ ( 0). This, in turn, implies that det(-4) ^ 0, which is what we 
wanted to show. 



Conversely, assume that det(-4) ^ 0. It follows from this that det(^) ^ 0, which tells us that R cannot have a row 
of zeros. Thus, it follows from Theorem 1.4.3 that R = J and hence that ^ is invertible by Theorem 1.6.4. 

EXAMPLE 3 Determinant Test for Invertibility M 

Since the first and third rows of 



It follows from Theorems 2.3.3 and Theorem 
2.2.5 that a square matrix with two proportional 
rows or two proportional columns is not 
invertible. 



A = 



1 2 

1 0 

2 4 



3 
1 

6 



are proportional, det{A) = 0. Thus A is not invertible. 



We are now ready for the main result concerning products of matrices. 



□ 



THEOREM 2.3.4 



If A and B are square matrices of the same size, then 



det(^)=det(^)det(5) 



Proof We divide the proof into two cases that depend on whether or not A is invertible. If the matrix A is not 
invertible, then by Theorem 1.6.5 neither is the product Thus, from Theorem Theorem 2.3.3, we have 
det(AB) = 0 and det(^) = 0, so it follows that det(AB) = det(A)dtt(B). 



Augustin Louis Cauchy (1789-1857) 



Historical Note In 1815 the great French mathematician Augustin Cauchy published a landmark paper 
in which he gave the first systematic and modem treatment of determinants. It was in that paper that 
Theorem 2.3.4 was stated and proved in fiill generality for the first time. Special cases of the theorem had 
been stated and proved earlier, but it was Cauchy who made the final jump. 
[Image: The Granger Collection, New York] 



Now assume thatv4 is invertible. By Theorem 1.6.4, the matrix v4 is expressible as a product of elementary 
matrices, say 



A = EiE2 - ■ 



■Br 



so 




• •£,)det(5) 



• • det(£y)det(5) 



EXAMPLE 4 Verifying 



That det(/\e) = det(/\), det(e) A 



Consider the matrices 



A = 



3 
2 



1 
1 



B = 



-1 3 

5 8 ' 



AB = 



2 17 

3 14 



We leave it for you to verify that 

det(^) = 1, det(5) = - 23, and det(^) = - 23 
Thus det(j45) = det(^)det(5), as guaranteed by Theorem 2.3.4. 



The following theorem gives a usefiil relationship between the determinant of an invertible matrix and the 
determinant of its inverse. 



THEOREM 2.3.5 



If A is invertible, then 

^ ^ det(A) 

□ y 

Proof Since ^-^^ = /, it follows that det(^~^^) = det(/). Therefore, we must have det(-4~^)det(-4) = 1. 
Since det(A) ^ 0, the proof can be completed by dividing through by det(^) . 



Adjoint of a Matrix 

In a cofactor expansion we compute det(-4) by multiplying the entries in a row or column by their cofactors and 
adding the resulting products. It turns out that if one multiplies the entries in any row by the corresponding 
cofactors from a different row, the sum of these products is always zero. (This result also holds for columns.) 
Although we omit the general proof, the next example illustrates the idea of the proof in a special case. 



It follows from Theorems 2.3.5 and 2.1.2 that 

det(^-l) = ^J L 

^11 ^22 <^nn 

Moreover, by using the adjoint formula it is 

possible to show that 

1 1 1 

^ir ^22""' <^nn 
are actually the successive diagonal entries of 
j[ ~^ (compare A and j[ ~^ in Example 3 of 
Section 1.7 ). 



EXAMPLE 5 Entries and Cofactors from Different Rows A 



Let 



an an an 
^21 ^22 ^23 
^31 ^^32 ^33 



Consider the quantity 

^11^31 +^12^32 + ^^13^33 

that is formed by multiplying the entries in the first row by the cofactors of the corresponding entries 
in the third row and adding the resulting products. We can show that this quantity is equal to zero by 
the following trick: Construct a new matrix A ' by replacing the third row of A with another copy of the 
first row. That is. 



r 



a\\ a\2 a\2 
^22 ^23 
an a\2 a\z 



Let C32 , C32 •> C33 be the cofactors of the entries in the third row of ^4 ''. Since the first two rows of A 
and a' are the same, and since the computations of C31, C32, C33, C3J , C32, and C33 involve only 
entries from the first two rows of A and ^4', it follows that 

C3I = C'si , ^Z2 = C32 ' = C33 

Since A' has two identical rows, it follows from 3 that 

det(^') = 0 (6) 

On the other hand, evaluating det(^'') by cofactor expansion along the third row gives 

det(^') =anC'2i +ai2C'22 +^13^33 =anC3i +^3(12032 + ^2^13033 (7) 

From 6 and 7 we obtain 

«llC31 +ai2C22 + ^a(i3C33 = 0 



DEFINITION 1 

If A is any « x « matrix and Cij is the cofactor of ^i; , then the matrix 



Cii 


C12 


Oi« 


C21 


C22 


C2m 




Cm2 --■ 





is called the matrix of cofactors from A. The transpose of this matrix is called the adjoint of A and is 
denoted by adj(^). 



EXAMPLE 6 Adjoint of a 3 X 3 Matrix ^ 

Let 



The cofactors of A are 



A = 



3 2-1 

1 6 3 

2 -4 0 



Cii = 12 Ci2 = 6 Ci3=-16 
C2l=4 C22 = 2 C23 = 16 
C3i = 12 C32= -10 C33 = 16 



so the matrix of cofactors is 



12 


6 


-16 


4 


2 


16 


12 


-10 


16 



and the adjoint of A is 





12 


4 


12 


adj(^) = 


6 


2 


-10 




-16 


16 


16 



Leonard Eugene Dickson (1874-1954) 

Historical Note The use of the term adjoint for the transpose of the matrix of cofactors appears to have 
been introduced by the American mathematician L. E. Dickson in a research paper that he pubHshed in 
1902. 

[Image: Courtesy of the American Mathematical Society] 



In Theorem 1.4.5 we gave a formula for the inverse of a 2 x 2 invertible matrix. Our next theorem extends that 
result to ^ X ?2 invertible matrices. 



THEOREM 2.3.6 Inverse of a Matrix Using Its Adjoint 

If A is an invertible matrix, then 



1 



det(^) 



■adj(^) 



(8) 



Proof We show first that 



A adj(^) = det(^)/ 



Consider the product 



Aadj(A) = 



02 1 CI22 



C|, 


C21 . 




. , Ctt\ 


Cl2 


Cry 


.- Cj2 . 




C\n 


C2n . 


.. Cj„ . 





Jl„\ a„2 ■■■ On„_ 

The entry in the rth row and yth column of the product A adj(^) is 



a,lCji + fl,-2Cj2 + - + ainCjn 



(9) 



(see the shaded lines above). 



If i = j, then 9 is the cofactor expansion of det(^) along the ?th row of^ (Theorem 2.1.1), and if i * j, then the 
fl's and the cofactors come from different rows of A, so the value of 9 is zero. Therefore, 

"det(^) 0 ... 0 

0 det(^) ... 0 



A adj(^) = 



0 0 ... det(^) 

Since ^ is invertible, det^A) ^ 0. Therefore, Equation 10 can be rewritten as 



= det(^)/ 



(10) 



1 



det(^) 

Multiplying both sides on the left by j[ yields 



[Aad}iA)]=I or A 



1 



A-' = 



1 



det(^) 
adj(^) 



■adj(^) 



= / 



det(^) 

EXAMPLE 7 Using the Adjoint to Find an Inverse IVIatrix A 

Use 8 to find the inverse of the matrix^ in Example 6. 
Solution We leave it for you to check that det(.<4) = 64. Thus 



^ -1 = _4— -adj(^) = 47 
det(A) -"^ ^ 64 



12 


4 


12 




6 


2 


-10 




-16 


16 


16 





12 


4 


12 


64 


64 


64 


6 


2 


10 


64 


64 


64 


16 


16 


16 


64 


64 


64 



Cramer's Rule 

Our next theorem uses the formula for the inverse of an invertible mafrix to produce a formula, called Cramer's 



rule, for the solution of a linear system Ax = b of w equations in n unknowns in the case where the coefficient 
matrix^ is invertible (or, equivalently, when det(-i4) ^ 0). 

y 



THEOREM 2.3.7 Cramer's Rule 



If ife = b is a system of n linear equations in n unknowns such that det(-4) ^ 0, then the system has a 
unique solution. This solution is 

_ clet(Ai) det(A2) „ _ clet(A^) 

^ det(A) ' ^ det(A) " det(A) 

where ^4^ is the matrix obtained by replacing the entries in the yth column of A by the entries in the matrix 

'^1 



b = 



: 

by^ 



Proof If det(-4) ^ 0, then A is invertible, and by Theorem 1 .6.2, x = -^4 ^b is the unique solution of ^ = b- 
Therefore, by Theorem 2.3.6 we have 



x = ^-^b = 



det(^) 



-adj(^)b = 



det(^) 



h 

by, 



Multiplying the matrices out gives 



x = 



det(^) 



^lCii+i>2C21+- + 6„C„i 
b 1 C12 + ^2^22 + - - - + bnCyi2 



i)lCi„+i>2C2„ + - + 6„C 



MM 



The entry in the yth row of x is therefore 

biCij+b2C2j + ... + b„C„j 



Now let 



^2\ ^22 



det(^) 



^iM 

^2m 



a 



nn 



(11) 



Since Aj differs fromv4 only in the yth column, it follows that the cofactors of entries b\, Z?2, in Aj are the 

same as the cofactors of the corresponding entries in the yth column of A. The cofactor expansion of det{Aj) 
along the yth column is therefore 



dtt(Aj) =biCij + b2C2j + ... + i„C„y 

Substituting this result in 1 1 gives 

_ det(^j) 
" det(^) 

EXAMPLE 8 Using Cramer's Rule to Solve a Linear System M 

Use Cramer's rule to solve 

x\ + + 2x2 = 6 

-3x1 + 47:2 + 6x3 = 30 

— JTI — 2x2 + 37:3 = 8 




Gabriel Cramer (1704-1752) 



Historical Note Variations of Cramer's rule were fairly well known before the Swiss 
mathematician discussed it in work he published in 1750. It was Cramer's superior notation 
that popularized the method and led mathematicians to attach his name to it. 
[Image: Granger Collection] 



Solution 





1 


0 


2 




6 0 


2 


A = 


-3 


4 


6 




30 4 


6 , 




-1 


-2 


3_ 






3_ 




1 


6 


2' 




1 0 


6 




-3 


30 


6 




-3 4 


30 




-1 


8 


3 




-1 -2 


8 



For w > 3, it is usually more efficient to 
solve a linear system with n equations in n 
unknowns by Gauss- Jordan elimination 
than by Cramer's rule. Its main use is for 
obtaining properties of solutions of a 
linear system without actually solving the 
system. 



Therefore, 

^ det(^l) ^ -40 ^ -10 _ det(^2) ^ 72 ^ 18 
^ det(^) 44 11 ' ^ det(^) 44 11' 

^ det(^3) ^ 152 ^ 38 
^ det(^) 44 11 



Equivalence Theorem 

In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix A. We conclude this 
section by merging Theorem 2.3.3 with that list to produce the following theorem that relates all of the major 
topics we have studied thus far. 



THEOREM 2.3.8 Equivalent Statements 

If v4 is an « X « matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is 

(d) A can be expressed as a product of elementary matrices. 
(^) Ax = h is consistent for every ^ x 1 matrix b. 

(D Ax. = h has exactly one solution for every ^ x 1 matrix b. 
(g) det(^)5t0. 



OPTIONAL 

We now have all of the machinery necessary to prove the following two results, which we stated without proof 
Theorem 1.7.1: 

* Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 



* Theorem 1.7.1 ((f) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an 
invertible upper triangular matrix is upper triangular. 



Proof of Theorem 1.7.1(c) Let A = [c^ij] be a triangular matrix, so that its diagonal entries are 

From Theorem 2.1.2, the matrix A is invertible if and only if 

<itt{A) =a\\a22 ' ' ' <^Ym 
is nonzero, which is true if and only if the diagonal entries are all nonzero. 

Proof of Theorem 1. 7.1(d) We will prove the result for upper triangular matrices and leave the lower 
triangular case for you. Assume that A is upper triangular and invertible. Since 

we can prove that A ~^ is upper triangular by showing that adj(^) is upper triangular or, equivalently, that the 
matrix of cofactors is lower triangular. We can do this by showing that every cofactor Cjj with i <. j (i.e., above 
the main diagonal) is zero. Since 

it suffices to show that each minor Myy with i < J is zero. For this purpose, let B^j be the matrix that results when 
the /th row and yth column of A are deleted, so 

Mjy = det(%) (12) 

From the assumption that i < j, it follows that B^j is upper triangular (see Figure Figure 1.7.1). Since A is upper 
triangular, its (Ml) -st row begins with at least / zeros. But the /th row of 5jy is the (i + 1 ) -st row of A with the 
entry in the yth column removed. Since i - j, none of the first / zeros is removed by deleting the yth column; thus 
the /th row of B^j starts with at least / zeros, which implies that this row has a zero on the main diagonal. It now 
follows from Theorem 2.1.2 that det(5jP = 0 and from 12 that Mjy = 0. 



Concept Review 

• Determinant test for invertibility 

• Matrix of cofactors 

• Adjoint of a matrix 

• Cramer's rule 

• Equivalent statements about an invertible matrix 

Skills 

• Know how determinants behave with respect to basic arithmetic operations, as given in Equation 1, 
Theorem 2.3.1, Lemma 2.3.2, and Theorem 2.3.4. 

• Use the determinant to test a matrix for invertibility. 



• Know how det(j4) and det(j4 ) are related. 

• Compute the matrix of cofactors for a square matrix A. 

• Compute adj(^) for a square matrix A. 

• Use the adjoint of an invertible matrix to find its inverse. 

• Use Cramer's rule to solve linear systems of equations. 

• Know the equivalent characterizations of an invertible matrix given in Theorem 2.3.8. 



Exercise Set 2.3 

In Exercises 1^, verify that det(it^) = /t"det(^). 
1. 



■A = 



A = 



A = 



A = 



-1 2 

3 4 

2 2 
5 -2 

2 -1 3 

3 2 1 
1 4 5 

1 1 1 
0 2 3 
0 1 -2 



; k = 2 
■ k= -4 

; k= -2 



; k=3 



In Exercises 5-6, verify that det(j45) = det{BA) and determine whether the equality 
det(^ + 5) = det(^) -I- det(5) holds. 



5. 


'2 1 


0" 






"1 


-1 3" 




A = 


3 4 


0 


and B = 


7 


1 2 






0 0 








5 


0 1 




6. 


■-1 


8 


2' 






'2 -1 


-4 


A = 


1 


0 


-1 


and 5 = 


1 1 


3 




-2 


2 


2 






0 3 


-1 



In Exercises 7-14, use determinants to decide whether the given matrix is invertible. 
7. 



A = 



2 
-1 

2 



5 5 
-1 0 
4 3 



Answer: 



Invertible 



8. [203 
i4= 0 3 2 

-2 0 -4_ 

9. [2 -3 5" 
A= 0 1-3 

0 0 2 

Answer: 



Invertible 






10. 


'-3 


0 


r 


A = 


5 


0 


6 




8 


0 


3 


11. 


r 4 


2 


8 




-2 


1 


-4 




3 


1 


6 



Answer: 



Not invertible 



12. 


1 


0 


-1 


i4 = 


9 - 


-1 


4 




8 


9 


-1 


13. 


2 


0 


0" 


A = 


8 


1 


0 




-5 


3 


6 



Answer: 



Invertible 



14. 



A = 



/2 0 

3/2 -3/7 0 

5 -9 0 



In Exercises 15-18, find the values of k for which A is invertible. 



Answer: 



16. 



i4 = 



2 

A: 2" 
2 ;t 



17. 



1 2 4 

3 1 6 
yt 3 2 



Answer: 
-1 



18. 



i4 = 



1 2 0 
k 1 yt 
0 2 1 



In Exercises 19-23, decide whether the given matrix is invertible, and if so, use the adjoint method to find its 
inverse. 



19. 



2 5 5 
-1 -1 0 
2 4 3 



Answer: 



3 -5 -5 
-3 4 5 
2 -2 -3 



20. 



A = 



21. 



A = 



2 0 3 

0 3 2 

-2 0 -4 

2 -3 5 

0 1 -3 

0 0 2 



Answer: 



.1-^ = 



1 ^ 1 

2 2 



0 

0 



22. 



A = 



23. 



i4 = 



2 0 0 
8 1 0 
-5 3 6 

13 11 

2 5 2 2 

13 8 9 

13 2 2 



Answer: 



-4 
2 

-7 
6 



0 -1 

0 0 
-1 8 

1 -7 



In Exercises 24-29, solve by Cramer's rule, where it applies. 
• 2x2 = 3 



24. 7X1 
3x1 

25. 4x 
llx 



+ 
+ 
+ 
+ 



X2 = 

y + 
5;^ + 



2z 
2z 



2 
3 
1 



26. 



Answer: 

11' 
X — 



4x - 



y 



-2- ^=_J- 

ir^ 11 

+ r = 6 
+ 2z = -1 



2x 


+ 


^ - 3? = 


-20 




27. Jri 






+ X3 




4 




2x1 










-2 




4x1 






- 3x3 




0 




Answer: 












^1 = 


11' ^ 


38 
~ 11' 


X3 = 


40 
" " 11 




28. -x\ 




4X2 


+ 2x3 


+ 


X4 = 


-32 


2X1 




X2 


+ 7x3 


+ 


9x4 = 


14 


-x\ 


+ 


X2 


+ 3x3 


+ 


X4 = 


11 


x\ 




2X2 


+ X3 




4x4 = 


-4 


29. 3x1 






-¥ X2 




4 




-^1 






— 2x2 




1 




2x1 


+ 


6x2 


- X2 




5 





Answer: 

Cramer's rule does not apply. 
30. Show that the matrix 



A = 



cos 0 sin 0 0 
—sin 0 cos 0 0 
0 0 1 



is invertible for all values of 0; then find using Theorem 2.3.6. 
31. Use Cramer's rule to solve for;; without solving for the unknowns x, z, and w. 



4x 


1 


y 


+ 


z 


+ 


w = 


6 


3x 


+ 


ly 




z 


+ 


w = 


1 


Ix 


+ 






5z 


+ 


8w = 


-3 


X 


+ 


y 


+ 


z 




2w = 


3 



Answer: 



y = 0 

32. Let i4x = h be the system in Exercise 3 1 . 

(a) Solve by Cramer's rule. 

(b) Solve by Gauss- Jordan elimination. 

(c) Which method involves fewer computations? 

33. Prove that if det(j4) = 1 and all the entries in A are integers, then all the entries in are integers. 

34. Let Ac = b be a system of n linear equations in n unknowns with integer coefficients and integer constants. 
Prove that if det(j4) = 1, the solution x has integer entries. 



35. Let 



i4 = 



a b c 
d e f 

g h i 



Assuming that det(^) = — 7, find 

(a) det(3il) 

(b) det(^"^) 

(c) det(2^"^) 

(d) det((2^)"^) 



(e) 



det 



a g d 
b h e 
c i / 



Answer: 



(a) -189 

(b) _i 

(c) -| 

(d) __L 

56 

(e) 7 

36. In each part, find the determinant given that ^ is a 4 x 4 matrix for which det(.<l) 
(a) <let(-j4) 



= -2 



(b) det(^~^) 

(c) det(2^^) 

(d) det(^^) 

37. In each part, find the determinant given that ^ is a 3 x 3 matrix for which det(-4) = 7 . 

(a) det(3^) 

(b) det(^"^) 

(c) det(2^"^) 

(d) det((2J)-^) 

Answer: 

(a) 189 

(b) i 

(c) i 

(d) i 
56 

38. Prove that a square matrix A is invertible if and only if a'^A is invertible. 

39. Show that if ^ is a square matrix, then det(A^A) = det(A4^). 

True-False Exercises 

In parts (a)-(l) determine whether the statement is true or false, and justify your answer. 

(a) If ^ is a 3 X 3 matrix, then det(2^) = 2 det(^). 
Answer: 

False 

(b) If A and B are square matrices of the same size such that det(A) = det(5) , then det(-4 + 5) = 2 det(A) . 
Answer: 

False 

(c) If A and B are square matrices of the same size and A is invertible, then 

detC^"^5^) = det(5) 

Answer: 

True 



(d) A square matrix A is invertible if and only if 6et{A) = 0. 
Answer: 

False 

(e) The matrix of cofactors of A is precisely [adj(j4) ] . 

Answer: 

True 

(f) For every ^^xn matrix A, we have 

i4-adj(^) = (det(^))/„ 

Answer: 

True 

(g) If ^ is a square matrix and the linear system = 0 has multiple solutions for x, then det(^) = 0. 
Answer: 

True 

(h) If v4 is an ^ X « matrix and there exists an « x 1 matrix b such that the linear system Ax = h has no solutions, 
then the reduced row echelon form of A cannot be 

Answer: 

True 

(i) If E is an elementary matrix, then Ex = 0 has only the trivial solution. 
Answer: 

True 

(j) If A is an invertible matrix, then the linear system ^^x = 0 has only the trivial solution if and only if the linear 
system = 0 has only the trivial solution. 

Answer: 

True 

(k) If A is invertible, then adj(-4) must also be invertible. 
Answer: 
True 

(1) If A has a row of zeros, then so does adj(-<4) . 
Answer: 

False 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Chapter 2 Supplementary Exercises 

In Exercises 1-8, evaluate the determinant of the given matrix by (a) cofactor expansion and (b) using 
elementary row operations to introduce zeros into the matrix. 

i.r-4 2" 



Answer: 
-18 




-3 1 1 



Answer: 



24 

-1 -2 -3" 

-4 -5 -6 

—7 —8 —9 

'3 0 -r 
1 1 1 

0 4 2 
Answer: 





-10 








6. 


-5 




1 4" 






3 




0 2 






1 




•2 2_ 




7. 


3 


6 


0 


1 




-2 


3 


1 


4 




1 


0 


-1 


1 




-9 


2 


-2 


2 



Answer: 



329 

8. 



9. Evaluate the determinants in Exercises 3-6 by using the arrow technique (see Example 7 in Section 2.1). 
Answer: 



Exercise 3: 24; Exercise 4: 0; Exercise 5: —10; Exercise 6: .48 

(a) Construct a 4 x 4 matrix whose determinant is easy to compute using cofactor expansion but hard to 
evaluate using elementary row operations. 

(b) Construct a 4 x 4 matrix whose determinant is easy to compute using elementary row operations but 
hard to evaluate using cofactor expansion. 

11. Use the determinant to decide whether the matrices in Exercises 1-4 are invertible. 
Answer: 

The matrices in Exercises 1-3 are invertible, the matrix in Exercise 4 is not. 

12. Use the determinant to decide whether the matrices in Exercises 5-8 are invertible. 

In Exercises 13-15, find the determinant of the given matrix by any method. 



13. 



5 b^3 
i-2 -3 

Answer: 





-b^ + 5b - 


-21 




14. 


3 -4 


a 








1 


2 








2 a-\ 


4 






15. 


0 0 0 




0 


-3 




0 0 0 




4 


0 




0 0-1 




0 


0 




0 2 0 




0 


0 




5 0 0 




0 


0 



Answer: 

-120 
16. Solve forx. 



X -1 
3 l-x 



1 0 -3 

2 X -6 

1 3 x-5 



In Exercises 17-24, use the adjoint method (Theorem 2.3.6) to find the inverse of the given matrix, if it 
exists. 



17. The matrix in Exercise 1. 



Answer: 



_1 1 

6 9 

i 2 
6 9 

18. The matrix in Exercise 2. 

19. The matrix in Exercise 3. 



Answer: 



1 


1 


3 


8 


8 


"8 


1 


5 


1 


8 


24 


"24 


1 


7 


1 


4 


"12 


"12 



20. The matrix in Exercise 4. 

21. The matrix in Exercise 5. 



Answer: 



1 


2 


1 


5 


5 


~10 


1 


3 


2 


5 


5 


5 


2 


6 


3 


5 


5 


"10 



22. The matrix in Exercise 6. 

23. The matrix in Exercise 7. 

Answer: 



10 


2 


52 


27 


329 


329 


329 


329 


55 


11 


43 


16 


329 


329 


329 


329 


3 


10 


25 


6 


47 


47 


47 


47 


31 


72 


102 


15 


329 


329 


329 


329 



24. The matrix in Exercise 8. 

25. Use Cramer's rule to solve for and y' in terms of x and j^. 



Answer: 

; 3 4 / 4 3 
^ = + 7 = - + j7 

26. Use Cramer's rule to solve for 7: ' and y' in terms of x andy. 

y =x' cos 

27. By examining the determinant of the coefficient matrix, show that the following system has a nontrivial 
solution if and only if a = ^. 

X + y + = 0 

X + y + 0z = 0 
00: + 0y + z = 0 

28. Let ^ be a 3 X 3 matrix, each of whose entries is 1 or 0. What is the largest possible value for det(-4)? 

(a) For the triangle in the accompanying figure, use trigonometry to show that 

b cos -h c cos ^ = a 

C cos OL + <2 COS 7 = b 

a cos & + b cos Q = c 

and then apply Cramer's rule to show that 

1 2 L 2 2 
2Fc 

(b) Use Cramer's rule to obtain similar formulas for coSj3 and COS7. 




Answer: 



(b) - c^ + a^-b^ a^ + b'^-c^ 

^^^^= 2^^ ' ^^^^= 2ab 

30. Use determinants to show that for all real values of X, the only solution of 

x — 27 = Ax 



31. Prove: If A is invertible, then adj(-i4) is invertible and 



[aclj(^)]-l = 



1 



det(^) 



-^=adj(^"^) 



32. Prove: If ^ is an ^ x « matrix, then 



det[adj(^)] = [det(^)] 



M-1 



33. Prove: If the entries in each row of Sin^ixn matrix A add up to zero, then the determinant of A is zero. 
[Hint: Consider the product jiX-> where X is the ^ x 1 matrix, each of whose entries is one. 

(a) In the accompanying figure, the area of the triangle ABC can be expressed as 

area ABC = area ADEC + area CEFB - ditdADFB 
Use this and the fact that the area of a trapezoid equals the altitude times the sum of the parallel 



sides to show that 



areaA5C=-i 



^1 71 1 
^3 72 1 
^3 73 1 



[Note: In the derivation of this formula, the vertices are labeled such that the triangle is traced 
counterclockwise proceeding from {x\,y\) to (a"2, 72) (^3.73)- ^ clockwise orientation, the 
determinant above yields the negative of the area.] 
(b) Use the result in (a) to find the area of the triangle with vertices (3, 3), (4, 0), (-2, -1). 




35. Use the fact that 21,375, 38,798, 34,162, 40,223, and 79,154 are all divisible by 19 to show that 

2 13 7 5 

3 8 7 9 8 

3 4 16 2 

4 0 2 2 3 
7 9 15 4 

is divisible by 19 without directly evaluating the determinant. 

36. Without directly evaluating the determinant, show that 

sin Q cos sm(o: + S) 



sin j9 cos 0 sin(^ + S) 
sin 7 cos 7 sin(7 + S) 



= 0 
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CHAPTER I 

3 Euclidean Vector Spaces 



CHAPTER CONTENTS 

3.1. Vectors in 2-Space, 3-Space, and ^-Space 

3.2. Norm, Dot Product, and Distance in 

3.3. Orthogonality 

3.4. The Geometry of Linear Systems 

3.5. Cross Product 



INTRODUCTION 



Engineers and physicists distinguish between two types of physical quantities — scalars, 
which are quantities that can be described by a numerical value alone, and vectors, which 
are quantities that require both a number and a direction for their complete physical 
description. For example, temperature, length, and speed are scalars because they can be 
fully described by a number that tells "how much" — a temperature of 20°C, a length of 5 
cm, or a speed of 75 km/h. In contrast, velocity and force are vectors because they require 
a number that tells "how much" and a direction that tells "which way" — say, a boat 
moving at 10 knots in a direction 45° northeast, or a force of 100 lb acting vertically. 
Although the notions of vectors and scalars that we will study in this text have their 
origins in physics and engineering, we will be more concerned with using them to build 
mathematical structures and then applying those structures to such diverse fields as 
genetics, computer science, economics, telecommunications, and environmental science. 
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3.1 Vectors in 2-Space, 3-Space, and r?-Space 

Linear algebra is concerned with two kinds of mathematical objects, "matrices" and "vectors." We are already 
familiar with the basic ideas about matrices, so in this section we will introduce some of the basic ideas about 
vectors. As we progress through this text we will see that vectors and matrices are closely related and that 
much of linear algebra is concerned with that relationship. 

Geometric Vectors 

Engineers and physicists represent vectors in two dimensions (also called 2-space) or in three dimensions 
(also called 3-space) by arrows. The direction of the arrowhead specifies the direction of the vector and the 
length of the arrow specifies the magnitude. Mathematicians call these geometric vectors. The tail of the 
arrow is called the initial point of the vector and the tip the terminal point (Figure 3.1.1). 

/Terminal point 

Initial point 

Figure 3.1.1 

In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we will denote scalars in 
lowercase italic type such as a, k, v, w, and x. When we want to indicate that a vector v has initial point A and 
terminal point B, then, as shown in Figure 3.1.2, we will write 




Figure 3.1.2 

Vectors with the same length and direction, such as those in Figure 3.1.3, are said to be equivalent. Since we 
want a vector to be determined solely by its length and direction, equivalent vectors are regarded to be the 
same vector even though they may be in different positions. Equivalent vectors are also said to be equal, 
which we indicate by writing 




Equivalent vectors 



Figure 3.1.3 



The vector whose initial and terminal points coincide has length zero, so we call this the zero vector and 
denote it by 0. The zero vector has no natural direction, so we will agree that it can be assigned any direction 
that is convenient for the problem at hand. 



There are a number of important algebraic operations on vectors, all of which have their origin in laws of 



Parallelogram Rule for Vector Addition 

If V and w are vectors in 2-space or 3 -space that are positioned so their initial points coincide, then the 
two vectors form adjacent sides of a parallelogram, and the sum v | w is the vector represented by 
the arrow from the common initial point of v and ^ to the opposite vertex of the parallelogram 
(Figure 3.1.4a). 



Vector Addition 



physics. 




(a) 



Kb) 



Figure 3.1.4 



L 



J 



Here is another way to form the sum of two vectors. 



r 



Triangle Rule for Vector Addition 



If V and w are vectors in 2-space or 3 -space that are positioned so the initial point of w is at the 
terminal point of y, then the sum v [ w is represented by the arrow from the initial point of y to the 
terminal point of ^ (Figure 3.1.4Z?). 



L 



J 



In Figure 3.1.4c we have constructed the sums v + w and vv-f v by the triangle rule. This construction makes 
it evident that 

v + w = w + v (1) 

and that the sum obtained by the triangle rule is the same as the sum obtained by the parallelogram rule. 
Vector addition can also be viewed as a process of translating points. 

r n 
Vector Addition Viewed as Translation 

If V, w, and v + w are positioned so their initial points coincide, then the terminal point of v 1 w can 
be viewed in two ways: 

1. The terminal point of v f w is the point that results when the terminal point of y is translated in 
the direction of ^ by a distance equal to the length of ^ (Figure 3.1.5a). 

2. The terminal point of v H w is the point that results when the terminal point of ^ is translated in 
the direction of y by a distance equal to the length of y (Figure 3.1.56). 

Accordingly, we say that v | w is the translation ofyby\^ or, alternatively, the translation of ^ by y. 





Figure 3.1.5 



Vector Subtraction 

In ordinary arithmetic we can write a — i = iaf+( — which expresses subtraction in terms of addition. 
There is an analogous idea in vector arithmetic. 



Vector Subtraction 



The negative of a vector y, denoted by —y, is the vector that has the same length as y but is 
oppositely directed (Figure 3.1 .6a), and the difference of y from denoted by iv — y, is taken to be 



the sum 



w-v = w+(-v) (2) 




Figure 3.1.6 



The difference of v from ^ can be obtained geometrically by the parallelogram method shown in Figure 
3.1.66, or more directly by positioning ^ and v so their initial points coincide and drawing the vector from the 
terminal point of y to the terminal point of w (Figure 3.1.6c). 

Scalar Multiplication 

Sometimes there is a need to change the length of a vector or change its length and reverse its direction. This 
is accomplished by a type of multiplication in which vectors are multiplied by scalars. As an example, the 
product 2v denotes the vector that has the same direction as y but twice the length, and the product — 2v 
denotes the vector that is oppositely directed to v and has twice the length. Here is the general result. 

n 



Scalar Multiplication 

If V is a nonzero vector in 2-space or 3-space, and if ^ is a nonzero scalar, then we define the scalar 
product of y by k. to be the vector whose length is \k\ times the length of y and whose direction is the 
same as that of v if ^ is positive and opposite to that of v if k is negative. If ^ = 0 or v = 0? then we 
define ^ to be Q- 

L J 



Figure 3.1.7 shows the geometric relationship between a vector y and some of its scalar multiples. In 
particular, observe that ( — l)v has the same length as y but is oppositely directed; therefore. 



(_l)v=-v 



(3) 



V /I . (-I)v / 

/ '-^ ^ 




Figure 3.1.7 



Parallel and Collinear Vectors 

Suppose that v and are vectors in 2-space or 3 -space with a common initial point. If one of the vectors is a 
scalar multiple of the other, then the vectors lie on a common line, so it is reasonable to say that they are 
collinear (Figure 3.1.8a). However, if we translate one of the vectors, as indicated in Figure 3.1.86, then the 
vectors are parallel but no longer collinear. This creates a linguistic problem because translating a vector does 
not change it. The only way to resolve this problem is to agree that the terms parallel and collinear mean the 
same thing when applied to vectors. Although the vector Q has no clearly defined direction, we will regard it 
to be parallel to all vectors when convenient. 




id) (b) 
Figure 3.1.8 



Sums of Three or More Vectors 

Vector addition satisfies the associative law for addition, meaning that when we add three vectors, say u, y, 
and w, it does not matter which two we add first; that is, 

u I (v-l- w) = (u4- v) +w 

It follows from this that there is no ambiguity in the expression q =(= y + w because the same result is obtained 
no matter how the vectors are grouped. 

A simple way to construct u + v + w is to place the vectors "tip to tail" in succession and then draw the 
vector from the initial point of u to the terminal point of w (Figure 3.1.9a). The tip-to-tail method also works 
for four or more vectors (Figure 3.1.96). The tip-to-tail method also makes it evident that if w, y? and ^ are 
vectors in 3 -space with a common initial point, then u -j- y -f w is the diagonal of the parallelepiped that has 
the three vectors as adjacent sides (Figure 3.1.9c). 




ia) (b) (c) 

Figure 3.1.9 



Vectors in Coordinate Systems 

Up until now we have discussed vectors without reference to a coordinate system. However, as we will soon 
see, computations with vectors are much simpler to perform if a coordinate system is present to work with. 

The component forms of the zero vector are 
0 = (0, 0) in 2-space and 0 = (0, 0, 0) in 
3-space. 

If a vector y in 2-space or 3-space is positioned with its initial point at the origin of a rectangular coordinate 
system, then the vector is completely determined by the coordinates of its terminal point (Figure 3.1.10). We 
call these coordinates the components of y relative to the coordinate system. We will write v = (vj, V2) to 
denote a vector y in 2-space with components (vj, V2), and v = (vi, V2, V3) to denote a vector y in 3-space 
with components (vj, V2, V3). 




Figure 3.1.10 



It should be evident geometrically that two vectors in 2-space or 3-space are equivalent if and only if they 
have the same terminal point when their initial points are at the origin. Algebraically, this means that two 
vectors are equivalent if and only if their corresponding components are equal. Thus, for example, the vectors 

v==(vi, V2, V3) and w=(wi>V2, W3) 

in 3-space are equivalent if and only if 

vi=w\, V2 = >V2, V3 = Vi?3 



Remark It may have occurred to you that an ordered pair (vi, V2) can represent either a vector with 



components v\ and V2 or a point with components v\ and V2 (and similarly for ordered triples). Both are valid 
geometric interpretations, so the appropriate choice will depend on the geometric viewpoint that we want to 
emphasize (Figure 3.1.11). 




Figure 3.1.11 The ordered pair (v \ , V2) can represent a point or a vector. 



Vectors Whose Initial Point Is Not at the Origin 

It is sometimes necessary to consider vectors whose initial points are not at the origin. If p^p^ denotes the 
vector with initial point Fi (xi, 71) and terminal point ^2(^2* 72)' ^^^^ components of this vector are 
given by the formula 

PiP2 = (^2-^u yi-yi) (4) 

That is, the components of P^P2 ^re obtained by subtracting the coordinates of the initial point from the 
coordinates of the terminal point. For example, in Figure 3.1.12 the vector PjP2 difference of vectors 
0^2 and QPi , so 

^^^2 = dP*2-0P[ = (X2, 72) -(xi, yi) = (x2-xh yi -y\) 

As you might expect, the components of a vector in 3 -space that has initial point Pi(xi,yi,zi) and terminal 
point P2(x2,y2,Z2) gi'^^" 

PlP2=(x2-xi, y2-yi, Z2-zi) (5) 



Figure 3.1.12 



EXAMPLE 1 Finding the Components of a Vector 




4 ) and terminal point 



12) 



n-Space 



The idea of using ordered pairs and triples of real numbers to represent points in two-dimensional space and 
three-dimensional space was well known in the eighteenth and nineteenth centuries. By the dawn of the 
twentieth century, mathematicians and physicists were exploring the use of "higher-dimensional" spaces in 
mathematics and physics. Today, even the layman is familiar with the notion of time as a fourth dimension, an 
idea used by Albert Einstein in developing the general theory of relativity. Today, physicists working in the 
field of "string theory" commonly use 1 1 -dimensional space in their quest for a unified theory that will 
explain how the fundamental forces of nature work. Much of the remaining work in this section is concerned 
with extending the notion of space to ^-dimensions. 

To explore these ideas further, we start with some terminology and notation. The set of all real numbers can 
be viewed geometrically as a line. It is called the real line and is denoted by R or ^ . The superscript 
reinforces the intuitive idea that a line is one-dimensional. The set of all ordered pairs of real numbers (called 
2-tuples) and the set of all ordered triples of real numbers (called 3-tuples) are denoted by and 
respectively. The superscript reinforces the idea that the ordered pairs correspond to points in the plane 
(two-dimensional) and ordered triples to points in space (three-dimensional). The following definition extends 
this idea. 



Remark You can think of the numbers in an ^-tuple (v i , V2, . - v„) as either the coordinates of a 
generalized point or the components of a generalized vector, depending on the geometric image you want to 
bring to mind — the choice makes no difference mathematically, since it is the algebraic properties of ^-tuples 
that are of concern. 



DEFINITION 1 



If w is a positive integer, then an ordered n-tuple is a sequence of n real numbers {v\, V2, v„). 
The set of all ordered ^-tuples is called n-space and is denoted by 



L 



J 



Here are some typical applications that lead to ^-tuples. 



Experimental Data A scientist performs an experiment and makes n numerical measurements each time 
the experiment is performed. The resuh of each experiment can be regarded as a vector 
y = 0 1 , 72, 7«) ^" which y^^y2^,..,y^^rQthQ measured values. 

Storage and Warehousing A national trucking company has 15 depots for storing and servicing its trucks. 
At each point in time the distribution of trucks in the service depots can be described by a 15 -tuple 
X = (x 1 , ^2, . - 15) in which x i is the number of trucks in the first depot, A' 2 is the number in the second 
depot, and so forth. 

Electrical Circuits A certain kind of processing chip is designed to receive four input voltages and 
produces three output voltages in response. The input voltages can be regarded as vectors in and the 
output voltages as vectors in Thus, the chip can be viewed as a device that transforms an input vector 
V = (vi, V2, V3, V4) in into an output vector w= (vi?i, W2, W3) in R^. 

Graphical Images One way in which color images are created on computer screens is by assigning each 
pixel (an addressable point on the screen) three numbers that describe the hue, saturation, and brightness 
of the pixel. Thus, a complete color image can be viewed as a set of 5-tuples of the form v = (7:, y. A?, s, b) 
in which x and are the screen coordinates of a pixel and h, s, and b are its hue, saturation, and brightness. 

Economics One approach to economic analysis is to divide an economy into sectors (manufacturing, 
services, utilities, and so forth) and measure the output of each sector by a dollar value. Thus, in an 
economy with 10 sectors the economic output of the entire economy can be represented by a 10-tuple 
s = (si, ^2, ^lo) in which the numbers S{, S2, - ^lo ^^e the outputs of the individual sectors. 

Mechanical Systems Suppose that six particles move along the same coordinate line so that at time t their 
coordinates are x\, X2, -J^6 their velocities are vj, V2, v^, respectively. This information can be 
represented by the vector 

V = (xu 7^2. A-4, 7i^, Xf,, VI, V2, V3, V4, V5, v^, 0 

in This vector is called the state of the particle system at time t. 



Albert Einstein (1879-1955) 

Historical Note The German-bom physicist Albert Einstein immigrated to the United States in 
1935, where he settled at Princeton University. Einstein spent the last three decades of his life 
working unsuccessfully at producing a unified field theory that would establish an underlying link 
between the forces of gravity and electromagnetism. Recently, physicists have made progress on the 
problem using a framework known as string theory. In this theory the smallest, indivisible 
components of the Universe are not particles but loops that behave like vibrating strings. Whereas 



Einstein's space-time universe was four-dimensional, strings reside in an 1 1 -dimensional world that is 
the focus of current research. 
[Image: © BettmannI© Corbis] 



Operations on Vectors in 



Our next goal is to define useful operations on vectors in These operations will all be natural extensions 
of the familiar operations on vectors mp} and p}. We will denote a vector y in jf?" using the notation 

v= (vi, V2,-.., v„) 

and we will call 0 = (0, 0, 0) the zero vector. 

We noted earlier that mp} and p} two vectors are equivalent (equal) if and only if their corresponding 
components are the same. Thus, we make the following definition. 

r n 
DEFINITION 2 

Vectors v = {v\, V2, Vyi) and w= (wi, W2, >v„) in are said to be equivalent (also called 
equal) if 

VI =^1, V2 = vt?2,---, v„ = w„ 

We indicate this by writing v = w. 



J 



EXAMPLE 2 Equality of Vectors < 

{a,b,c,d) = {\, -4,2,7) 
if and only if caf = 1, i = — 4, c = 2, and d = 1- 



Our next objective is to define the operations of addition, subtraction, and scalar multiplication for vectors in 
To motivate these ideas, we will consider how these operations can be performed on vectors in p^ using 
components. By studying Figure 3.1.13 you should be able to deduce that if v = (vj, V2) and w= (wi, W2), 
then 

v + w=(vi+wi, V2+W2) (6) 



kv= (kv\, kv2) 



(7) 



In particular, it follows from 7 that 



_v=(-l)v=(-vi, -V2) (8) 

and hence that 

w-v=w+ (-v) = (wi -VI, W2-V2) (9) 




DEFINITION 3 

If V = (vi, V2, v„) and w= (wj, •M'2, w^) are vectors in /J", and if A: is any scalar, then we 
define 

V+W=(vi I Wi, V2+W2, ...v„+w„) (10) 



(11) 



_v=(-vi, -V2,...-V„) 



(12) 



w-v = w+ (-v) = (wi -VI, W2-V2, ...w„-v„) 



(13) 



In words, vectors are added (or subtracted) by 
adding (or subtracting) their corresponding 
components, and a vector is multiplied by a 
scalar by multiplying each component by that 
scalar. 

EXAMPLE 3 Algebraic Operations Using Components M 

Ifv=(l, -3, 2) and w= (4, 2, l),then 

v + w=(5, »1,3), 2v=(2, -6,4) 

-w=(-4, -2-1) v-w=v+(-w) = (-3, -5, 1) 



The following theorem summarizes the most important properties of vector operations. 

□ 

THEOREM 3.1.1 

Ifu, V, and w are vectors in and if ^ and m are scalars, then: 

(a) u + v = v + u 

(hj (u + v) +w = u+ (v+w) 

(^c) u + 0 = 0 + u = u 

(d) u+(-u)=0 

f^e) + v) = jfcu + jfcv 

0 (k + m)u = kn + mn 

f^gj k(mu) — (km)u 

(h) lu = u 

m 

We will prove part (b) and leave some of the other proofs as exercises. 

Proof (b) Let u = U2, w„), v = (vi, V2, v„), and w= (wi, W2, w^). Then 

(u + v) +w = ((«i, U2, «„) + (vi, V2, v„)) + (wi, W2, w„) 

= («1 +vi,«2 + V2, «m + Vm) + (wi, W2»---. w«) [Vector addition] 

= C(«l +vi) («2 + V2) +W2. C"m + v„) +w„) [Vector addition] 

= (til + (vi +wi),«2 + (V2 + W2). + (vm + Ww)) [Regroup] 

1 + w 1 , V2 + >V2» - - V + Wyi) [Vector addition] 

= u+ (v+w) 



The following additional properties of vectors in can be deduced easily by expressing the vectors in terms 
of components (verify). 



THEOREM 3.1.2 

If V is a vector in and ^ is a scalar, then: 

(a) Ov = 0 

(b) f^ = o 

(c) (-l)v= -V 

3 U 



Calculating Without Components 

One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow calculations to be performed 
without expressing the vectors in terms of components. For example, suppose that x, a, and b are vectors in 

and we want to solve the vector equation x + a = b for the vector x without using components. We could 
proceed as follows: 

x + a = b [Given] 

(x + a) + ( — a) = b + ( — a) Add the negative of a to both sides 

x+(a+(-a))=b-a Part (b) of Theorem 3.1.1 

X + 0 = b — a Part (d) of Theorem 3,1,1 

X = b — a Part (c) of Theorem 3,1,1 

While this method is obviously more cumbersome than computing with components inR^, it will become 
important later in the text where we will encounter more general kinds of vectors. 



Linear Combinations 



Addition, subtraction, and scalar multiplication are frequently used in combination to form new vectors. For 
example, if vi , V2, and V3 are vectors inR^, then the vectors 

u = 2vi + 3v2 + V3 and w= 7vi — 6v2 + 8V3 

are formed in this way. In general, we make the following definition. 

r n 



DEFINITION 4 

If IV is a vector in i?", then iv is said to be a linear combination of the vectors v\, V2, - 



-., Vr inR^ if it 



can be expressed in the form 



w=kiYi +jt2V2+... + jtyVy (14) 

where itj, A"^ are scalars. These scalars are called the coefficients of the linear combination. In 

the case where r = 1? Formula 14 becomes w = kiv\, so that a linear combination of a single vector 
is just a scalar muliple of that vector. 



Note that this definition of a linear combination 
is consistent with that given in the context of 
matrices (see Definition 6 in Section 1.3). 



Application of Linear Combinations to Color Models 

Colors on computer monitors are commonly based on what is called the RGB color model. Colors in 
this system are created by adding together percentages of the primary colors red (R), green (G), and 
blue (B). One way to do this is to identify the primary colors with the vectors 

r= (1, 0, 0) (pure red), 

g= (0, 1, 0) (pure green), 

b= (0, 0, 1) (pure blue) 

in and to create all other colors by forming linear combinations of r, g, and b using coefficients 
between 0 and 1, inclusive; these coefficients represent the percentage of each pure color in the mix. 
The set of all such color vectors is called RGB space or the RGB color cube (Figure 3.1.14). Thus, 
each color vector c in this cube is expressible as a linear combination of the form 

c = k\Y + ^28 + ^3^ 

= ^l(l,0, 0) +^2(0, 1,0) +^3(0, 0,1) 
= (*l,it2,^3) 

where 0 < jtj < 1 . As indicated in the figure, the corners of the cube represent the pure primary colors 
together with the colors black, white, magenta, cyan, and yellow. The vectors along the diagonal 
running from black to white correspond to shades of gray. 



Blue 
(0. 0. 1 ) 



Magenta 
(1,0.1) 

Black 
(0.0.0) 

Red ^ 
(1.0,0) 




(1. 1.0) 



Figure 3.1.14 



Alternative Notations for Vectors 



Up to now we have been writing vectors in using the notation 



v=(vi.V2,..., v„) 



(15) 



We call this the comma-delimited form. However, since a vector in is just a list of its n components in a 
specific order, any notation that displays those components in the correct order is a valid way of representing 
the vector. For example, the vector in 15 can be written as 



v= [vi V2.--V„] 



(16) 



which is called row-matrix form, or as 



V = 



V2 



(17) 



which is called column-matrix form. The choice of notation is often a matter of taste or convenience, but 
sometimes the nature of a problem will suggest a preferred notation. Notations 15, 16, and 17 will all be used 
at various places in this text. 



Concept Review 

• Geometric vector 

• Direction 

• Length 

• Initial point 

• Terminal point 

• Equivalent vectors 



• Zero vector 

• Vector addition: parallelogram rule and triangle rule 

• Vector subtraction 

• Negative of a vector 

• Scalar multiplication 

• Collinear (i.e., parallel) vectors 

• Components of a vector 
« Coordinates of a point 

• n-tuple 

• n-space 

• Vector operations in w-space: addition, subtraction, scalar multiplication 

• Linear combination of vectors 

Skills 

• Perform geometric operations on vectors: addition, subtraction, and scalar multiplication. 

• Perform algebraic operations on vectors: addition, subtraction, and scalar multiplication. 

• Determine whether two vectors are equivalent. 

• Determine whether two vectors are collinear. 

• Sketch vectors whose initial and terminal points are given. 

• Find components of a vector whose initial and terminal points are given. 

• Prove basic algebraic properties of vectors (Theorems 3.1.1 and 3.1.2). 



Exercise Set 3.1 

In Exercises 1-2, draw a coordinate system (as in Figure 3.1.10) and locate the points whose coordinates 
given. 

l-(a) (3,4,5) 

(b) (-3, 4, 5) 

(c) (3,-4,5) 

(d) (3, 4, -5) 

(e) (-3,-4,5) 

(f) (-3,4,-5) 

Answer: 



(a) 



(b) 



(c) 



(d) 



(e) 



(f> 



A I (-3. 4. 5) 



(3.n.5)| 



(3. 4. -51 
1-3.-4,5)1 



1[ 



-I 
-I 



'^-1 (|?.4.-5) 



(a) (0,3,-3) 

(b) (3,-3,0) 

(c) (-3,0,0) 

(d) (3, 0, 3) 

(e) (0,0,-3) 

(f) (0,3,0) 



In Exercises 3^, sketch the following vectors with the initial points located at the origin. 



3- (a) VI = (3, 6) 

(b) V2 = (-4, -8) 

(c) V3 = ( - 4, - 3) 



(d) V4 = 

(e) V5 = 

(f) V6 = 



(3.4. 5) 
(3, 3. 0) 
(-1.0, 2) 



Answer: 
(a) 



(b) 



(c) 



(d) 



(e) 



(f) 



(5. -4) 
(3. 0) 
(0. -7) 
(0.0. -3) 
(0.4. -1) 



4. (a) VI = 

(b) V2 = 

(c) V3 = 

(d) V4 = 

(e) V5 = 



(f) V6 = (2, 2, 2) 



In Exercises 5-6, sketch the following vectors with the initial points located at the origin. 

5. (a) Pi (4, 8), P2(3,7) 

(b) Pi(3, -5). P2C-4. -7) 

(c) Pi (3, -7,2), P2(-2.5. -4) 

Answer: 



(a) 



(b) 



(c) 



6.(a) Pi(-5,0). P2(-3.1) 

(b) Pi (0,0), P2(3,4) 

(c) Pi(-1,0,2), P2(0, -1.0) 

(d) Pi (2, 2, 2), P2(0,0,0) 



X 




t I I I I I I I 



7!>' 



In Exercises 7-8, find the components of the vector P^P2- 



7. (a) Pi(3,5), P2(2,8) 

(b) Pi (5, -2,1), P2(2,4,2) 

Answer: 

(a) P^P2 = (-1,3) 

(b) P^2 = (-3.6.1) 

8. (a) Pi(-6,2). P2(-4, -1) 
(b) Pi (0,0,0), P2(- 1,6,1) 

^' (a) Find the terminal point of the vector that is equivalent to a = ( 1 , 2) and whose initial point is i4( 1 , 1 ) 



(b) Find the initial point of the vector that is equivalent to q = (1, 1,3) and whose terminal point is 
-1,2). 



Answer: 

(a) The terminal point is 5(2, 3). 

(b) The initial point is i4(— 2, — 2, — 1) . 

(a) Find the initial point of the vector that is equivalent to u = (1, 2) and whose terminal point is 5(2, 

(b) Find the terminal point of the vector that is equivalent to u = (1, 1, 3) and whose initial point is 
A(0, 2, 0). 

11. Find a nonzero vector u with terminal point Q(3, 0, — 5) such that 

(a) u has the same direction as v = (4, — 2, — 1). 

(b) u is oppositely directed to v = (4, — 2, — 1). 

Answer: 

(a) Q = (— 1, 2, — 4) is one possible answer. 

(b) Q = (7, — 2, — 6) is one possible answer. 

12. Find a nonzero vector u with initial point P( — 1, 3, — 5) such that 

(a) u has the same direction as v = (6, 7, — 3) . 

(b) u is oppositely directed to v = (6, 7, — 3). 

13. Let u = (4, — 1), V = (0, 5), and Hr = ( — 3, — 3). Find the components of 

(a) u+w 

(b) v-3u 

(c) 2(u-5w) 

(d) 3v-2(u + 2w) 

(e) -3(w-2u + v) 

(f) ( — 2u — v) — 5(v + 3w) 

Answer: 

(a) u + w=(l, -4) 

(b) v-3u=(-12, 8) 

(c) 2(u-5w) = (38,28) 

(d) 3v-2(u + 2w) = (4,29) 

(e) ^3(w-2u + v) = (33, -12) 
(-2u-v)-5(v + 3w) = (37,17) 

14. Let Q = ( — 3, 1, 2), V = (4, 0, — 8), and w= (6, — 1, — 4). Find the components of 



(a) v-w 

(b) 6u+2y 

(c) -v + u 

(d) 5(v-4u) 

(e) -3(t-8w) 

(f) (2u-7w)-(8t + u) 

15. Let u = ( — 3, 2, 1, 0), V = (4, 7, — 3, 2), and w= (5, — 2, 8, 1). Find the components of 

(a) v-w 

(b) 2u + 7v 

(c) -u+(v-4w) 

(d) 6(u-3v) 

(e) -v-w 

(f) (6v — w) — (4u + v) 
Answer: 

(a) (-1,9, -11,1) 

(b) (22, 53, - 19, 14) 

(c) (-13,13, -36, -2) 

(d) (-90. - 114.60. -36) 

(e) (-9, -5. -5. -3) 

(f) (27,29. -27.9) 

16. Let u, V, and w be the vectors in Exercise 15. Find the vector x that satisfies 5x — 2v = 2 (w— 5x). 

17. Let u= (5. -1,0,3, -3),v=(-l, - 1,7, 2, 0), andir= (-4, 2, -3, - 5, 2). Find the 
components of 

(a) v-n 

(b) 2v + 3u 

(c) ^+3(v-u) 

(d) 5(-v + 4u-w) 

(e) -2(3w+v) + (2u+w) 

(f) i(w-5v + 2u)+v 

Answer: 

(a) w-u=(-9.3. -3, -8,5) 

(b) 2v + 3u=(13, -5, 14.13, -9) 

(c) -w+3(v-u) = (-14, -2,24.2.7) 

(d) 5(-v + 4u-w) = (125. -25, -20,75, -70) 



(e) -2(3w+ V) + (2u + w) = (32. 



10, 1,27,-16) 



(f) i(w-5v + 2u)+v=(|, |, 



12. -f. -2) 



18. Leta=(1.2. -3.5.0). v=(0,4, - 1, 1,2), andw=(7, 1, -4, - 2, 3). Find the components of 

(a) v + w 

(b) 3(2u-v) 

(c) (3u-v)-(2u + 4w) 

19. Let a= ( - 3, 1, 2, 4, 4). v= (4, 0, - 8, 1. 2), and w= (6. - 1, -4, 3, - 5). Find the components 
of 

(a) v-w 

(b) 6u + 2v 

(c) (2u-7w)-(8v + u) 



20. Let u, V, and w be the vectors in Exercise 18. Find the components of the vector x that satisfies the 
equation 3u + v - 2w = 3x + 2w. 

21. Let u, V, and w be the vectors in Exercise 19. Find the components of the vector x that satisfies the 
equation 2u-v + i = 7x+w- 



Answer: 



(a) v-w=(-2,l, -4. -2.7) 

(b) 6u + 2v=(-10,6, -4,26,28) 

(c) (2ii-7w)-(8v + u) = (-77,8,94, -25.23) 



Answer: 




23. Which of the following vectors in /j*^ are parallel to u = ( — 2, 1, 0, 3, 5, 1)? 

(a) (4, 2, 0, 6, 10, 2) 

(b) (4, -2,0, -6. -10. -2) 

(c) (0, 0, 0. 0. 0. 0) 



Answer: 



(a) Not parallel 

(b) Parallel 



(c) Parallel 

24. Let u = (2, 1, 0, 1, - 1) and V =( - 2, 3, 1, 0, 2) . Find scalars a and b so that 
cm + bY=(-S,S,3, -1.7). 

25. Leta=(l, - 1, 3, 5) and v= (2, 1, 0, -3). Find scalars a and Z) so that i!nH-i>v= (1, -4,9, 18). 

Answer: 

fl = 3, b=-l 

26. Find all scalars ci, C2, and C2 such that 

ci(l, 2. 0) +C2(2, 1. 1) +C3(0. 3. 1) = (0, 0, 0) 

27. Find all scalars ^1,^2, and such that 

- 1, 0) +C2(3, 2, 1) +C3(0. 1. 4) = ( - 1, 1, 19) 

Answer: 

£71 = 2, C2= -1, <:3 = 5 

28. Find all scalars t^l, ^2, and ^3 such that 

ci( - 1, 0, 2) +C2(2. 2, - 2) +C3(1. - 2, 1) = ( - 6. 12, 4) 

29. Letui = C-l,3, 2, 0),U2- (2,0,4, -1),U3=(7, 1, 1,4), andu4= (6, 3, 1, 2). Find scalars ci, 
C2, C2, and C4 such that ejui + <;2U2 + <?3™3 + ^4*^ = (0. 5, 6, — 3) . 

Answer: 

ei = l, C2 = l, C3= -1, C4=l 

30. Show that there do not exist scalars ci,C2, and C3 such that 

<:i(l. 0. 1. 0) +C2(1. 0. - 2, 1) +C3(2, 0, 1, 2) = (1, -2, 2, 3) 

31. Show that there do not exist scalars ^1,^2? and C3 such that 

ci( - 2, 9, 6) +C2( - 3, 2, 1) +c:3(l. 7, 5) = (0, 5, 4) 

32. Consider Figure 3.1.12. Discuss a geometric interpretation of the vector 

33. Let P be the point (2, 3, - 2) and Q the point (7, - 4, 1) . 

(a) Find the midpoint of the line segment connecting P and Q, 

(b) Find the point on the line segment connecting P and Q that is of the way from P to Q. 

A 

Answer: 

(a) fi _i _ n 

(b) 23 _9 

U ' 4' 4j 



34. Let P be the point (1, 3, 7) . If the point (4, 0, — 6) is the midpoint of the line segment connecting P and 
Q, what is Ql 

35. Prove parts {a), (c), and {d) of Theorem 3.1.1. 

36. Prove parts {e)-{h) of Theorem 3.1.1. 

37. Prove parts {a)-{c) of Theorem 3.1.2. 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Two equivalent vectors must have the same initial point. 
Answer: 

False 

(b) The vectors b) and {a, b, 0) are equivalent. 
Answer: 

False 

(c) If ^ is a scalar and v is a vector, then v and kv are parallel if and only if > 0- 
Answer: 

False 

(d) The vectors v + (u + w) and (w + v) + u are the same. 
Answer: 

True 

(e) If u + V = u + then v = w. 
Answer: 

True 

(f) If a and b are scalars such that i^u + iv = 0, then u and v are parallel vectors. 
Answer: 

False 

(g) Collinear vectors with the same length are equal. 
Answer: 

False 

(h) If (a, b, c) + (x, 7, z) = (t:, 7, z), then (a, b, c) must be the zero vector. 



Answer: 

True 

(i) If k and m are scalars and u and v are vectors, then 

(jt + w) (u + v) = jfcu + mv 

Answer: 

False 

(j) If the vectors v and w are given, then the vector equation 

3(2v-x) = 5x-4w+v 

can be solved for x. 

Answer: 

True 

(k) The linear combinations a\Y\ + a^2 and b\Y\ + b^2 ^^ty equal \ia\=b\ and 32 = i>2- 
Answer: 

False 
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3-2 Norm, Dot Product, and Distance in 



In this section we will be concerned with the notions of length and distance as they relate to vectors. We will 
first discuss these ideas in /J^ and and then extend them algebraically to R^. 



Norm of a Vector 

In this text we will denote the length of a vector v by the symbol || v|| , which is read as the norm of v, the 
length of V, or the magnitude of v (the term "norm" being a common mathematical synonym for length). As 
suggested in Figure 3.2.1(3, it follows from the Theorem of Pythagoras that the norm of a vector (vi, V2) in R^ 
is 



= }lv^+vi (1) 



Similarly, for a vector (v\, V2, V3) in it follows from Figure 3.2. IZ? and two applications of the Theorem of 
Pythagoras that 

||v||2 = (ORf + iRP)^ = (002 + iQR)^ + (RP)^ = + v| + 

and hence that 

IIyII = |/v? + v| + v| (2) 

Motivated by the pattern of Formulas 1 and 2 we make the following definition. 

r n 
DEFINITION 1 

If V = (v 1 , V2, - - v„) is a vector in R^, then the norm of v (also called the length of v or the 
magnitude of v) is denoted by ||v||, and is defined by the formula 



IMI = |/v? + v| + v^2^... + v2 (3) 



EXAMPLE 1 Calculating Norms M 

It follows from Fomiula 2 that the norm of the vector v = ( — 3, 2, 1) in /J-^ is 

IMI = /(_3)2 + 22+l2 = /l4 

and it follows from Formula 3 that the norm of the vector v=(2, — 1,3, — 5) in is 

l|v|| = v^22 + (-l)2 + 32 + (-5)2 = /39 




X 



(a) 




V 



X 



(b) 



Figure 3.2.1 



Our first theorem in this section will generalize to /?" the following three familiar facts about vectors in and 



* Distances are nonnegative. 

« The zero vector is the only vector of length zero. 

« Multiplying a vector by a scalar multiplies its length by the absolute value of that scalar. 

It is important to recognize that just because these results hold in and does not guarantee that they hold 

in — their validity in /?" must be proved using algebraic properties of ^-tuples. 




0 



THEOREM 3.2.1 




□ 



We will prove part (c) and leave {a) and {b) as exercises. 



Proof (c) Ifv= (vi, V2,..., v„),thentv= (kvi, kv2, kv„), so 



Il^ll = »^(/tvi)2 + (ytv2)^+ • • • +(^«)^ 



= ^(*')(vf+v|+...+v^) 



= |*|l|v|| 



L/n/Y Viectors 



A vector of norm 1 is called a w/i^V vector. Such vectors are useful for specifying a direction when length is not 
relevant to the problem at hand. You can obtain a unit vector in a desired direction by choosing any nonzero 
vector V in that direction and multiplying v by the reciprocal of its length. For example, if v is a vector of 
length 2 in p} or p}, then -iv is a unit vector in the same direction as v. More generally, if v is any nonzero 

vector in then 



defines a unit vector that is in the same direction as v. We can confirm that 4 is a unit vector by applying part 
(c) of Theorem 3.2.1 with k= \ I ||v|| to obtain 



The process of multiplying a nonzero vector by the reciprocal of its length to obtain a unit vector is called 
normalizing v. 

WARNING 

Sometimes you will see Formula 4 expressed as 




This is just a more compact way of writing that 
formula and is not intended to convey that v is 
being divided by ||v||. 

EXAMPLE 2 Normalizing a Vector A 

Find the unit vector u that has the same direction as v = (2, 2, — 1). 
Solution The vector v has length 



Q = 




(4) 



Hull = ||*v|| = |^|||v|| =i||v|| = ijijpllvll = 1 



llvjj = ^2^ + 2^+{-\)^ = 3 



Thus, from 4 




As a check, you may want to confirm that ||u|| = 1 . 



The Standard Unit Vectors 

When a rectangular coordinate system is introduced in /^^ or the unit vectors in the positive directions of 
the coordinate axes are called the standard unit vectors. In these vectors are denoted by 

i=(l,0) and j=(0.1) 

and in /J ^ by 

i= (1.0,0), j= (0.1,0), and k= (0,0,1) 

(Figure 3.2.2). Every vector v = (y\, V2) in and every vector v = (vj, V2, V3) in can be expressed as a 
linear combination of standard unit vectors by writing 

v= (vi, V2) =vi(l, 0) +V2(0, 1) =vii-f'V2j (5) 

v=(vi,V2.V3)=vi(l, 0,0)+V2(0. 1,0)+V3(0.0, 1) = vii + vaH- vsk (6) 

Moreover, we can generalize these formulas to /?" by defining the standard unit vectors in to be 

ei = (l,0,0,...,0). 62 = (0,1,0 0) e„ = (0.0, 0...., 1) (7) 

in which case every vector v = (vj, V2, v„) in can be expressed as 

v=(vi,V2,...,v„) =viei +V2e2+... + v„e„ (8) 

EXAMPLE 3 Linear Combinations of Standard Unit Vectors M 



(2, -3,4) = 2i-3j + 4k 

(7, 3, - 4, 5) = 7ei + 362 - ^es + 5e4 




Distance in rP 

If Pi and P2 are points in /J ^ or f;-', then the length of the vector p^p^ i^ equal to the distance d between the 
two points (Figure 3.2.3). Specifically, if {x\,y\) and ^2(^2* 72) points in then Formula 4 of 
Section 3.1 implies that 

This is the familiar distance formula from analytic geometry. Similarly, the distance between the points 
^1(^1. J 1.^1 ) andP2(^2.;'2.22) in3-space is 

d{u, V) = \\P^2\\ = ^ {X2 - + O2 -yi)^ + {Z2 (10) 
Motivated by Formulas 9 and 10, we make the following definition. 

r n 
DEFINITION 2 

If u = (wi, «2» ^m) ^iid V = (vi, V2, v„) are points in then we denote the distance between 
u and V by ci (u, v) and define it to be 

rf(u, v) = ||u- v|| = ^/(«i-vi)^ + («2-V2)^+ • • • +("m-v„)^ (11) 



d 



Figure 3.2.3 



We noted in the previous section that «-tuples 
can be viewed either as vectors or points mR^. 
In Definition 2 we chose to describe them as 
points, as that seemed the more natural 
interpretation. 

EXAMPLE 4 Calculating Distance in f?" < 

If 

u=(l,3, -2,7) and v=(0. 7,2,2) 
then the distance between u and v is 

a?(u, v) = _ 0)2 + (3-7)2 _^ _ 2)2 (7 _ 2)2 = 



Dot Product 

Our next objective is to define a useful multiplication operation on vectors in ^ and p} and then extend that 
operation to To do this we will first need to define exactly what we mean by the "angle" between two 
vectors in or p}. For this purpose, let u and v be nonzero vectors in or p'^ that have been positioned so 
that their initial points coincide. We define the angle between u and v to be the angle 0 determined by u and v 
that satisfies the inequalities 0 < < .u" (Figure 3.2.4). 



r 



DEFINITION 3 

If u and V are nonzero vectors in or and if 0 is the angle between u and v, then the dot product 
(also called the Euclidean inner product) of u and v is denoted by u • v and is defined as 

Q-v=||u||||v||costf (12) 



If u = 0 or V = 0? then we define u • v to be 0. 






The angle 0 between ii and v satisfies 0 < < ir. 



Figure 3.2.4 

The sign of the dot product reveals information about the angle 0 that we can obtain by rewriting Formula 12 
as 



llullllvll 



(13) 



Since 0 < < ir, it follows from Formula 13 and properties of the cosine function studied in trigonometry that 

• ffis acute if u • v> 0- 

• ffis obtuse if u • v < 0- 

• ff = ir/2ifu-v = 0. 

EXAMPLES DotProduct < 

Find the dot product of the vectors shown in Figure 3.2.5. 



(0,2,2) 




Figure 3.2.5 



Solution The lengths of the vectors are 

||u|| = l and ||v|| = /8 = 2/2 

and the cosine of the angle 0 between them is 

cos(45')=l//2 

Thus, it follows from Formula 12 that 



a ■ V = Hull llvllcos = (1) (2/2) (1 / /2) = 2 



EXAMPLE 6 A Geometry Problem Solved Using Dot Product M 

Find the angle between a diagonal of a cube and one of its edges. 

Solution Let k be the length of an edge and introduce a coordinate system as shown in Figure 3.2.6. 
If we let ui = (k, 0, 0), U2 = (0, 0), and U3 = (0, 0, k), then the vector 

d = (k, k, k) =u\ +U2 \ U3 

is a diagonal of the cube. It follows from Formula 13 that the angle 0 between d and the edge 
satisfies 




t(0,0,A:) 




0,0) 

Figure 3.2.6 

Note that the angle 9 obtained in Example 6 
does not involve k. Why was this to be 
expected? 

Component Form of the Dot Product 



For computational purposes it is desirable to have a formula that expresses the dot product of two vectors in 
terms of components. We will derive such a formula for vectors in 3-space; the derivation for vectors in 
2-space is similar. 



Let u = U2, uy) and v = (vj, V2, V3) be two nonzero vectors. If, as shown in Figure 3.2.7, 0 is the angle 
between u and v, then the law of cosines yields 



Historical Note The dot product notation was first introduced by the American physicist and 
mathematician J. Willard Gibbs in a pamphlet distributed to his students at Yale University in the 
1880s. The product was originally written on the baseline, rather than centered as today, and was 
referred to as the direct product. Gibbs's pamphlet was eventually incorporated into a book entitled 
Vector Analysis that was published in 1901 and coauthored with one of his students. Gibbs made major 
contributions to the fields of thermodynamics and electromagnetic theory and is generally regarded as 
the greatest American physicist of the nineteenth century. 
[Image: The Granger Collection, New York} 

■ 

Since PQ = y — u, we can rewrite 14 as 



ll^ll =||u||2 + ||v||2-2||u||||v||costf 



(14) 




Josiah Willard Gibbs (1839-1903) 



INI IMIcos 0 = i (||u||2 + ||v||2 - ||v - u||2) 



or 




Substituting 



||u||2 = «2 ^ ul + ul ||v||2 = + vl + vl 



and 



l|v-u|p = (vi (v2-«2)^+ (v3-«3) 



2 



we obtain, after simplifying, 



\y-v = u\V\ +U2V2 + «3V3 



(15) 



Although we derived Formula 15 and its 
2-space companion under the assumption that u 
and V are nonzero, it turned out that these 
formulas are also applicable if u = 0 or v = 0 
(verify). 

The companion formula for vectors in 2-space is 

n'V = uivi+U2V2 { 

Motivated by the pattern in Formulas 15 and 16, we make the following definition. 

r 

DEFINITION 4 

lfn=(u\,U2,..-,ii}:) and v = (v i , V2, . - v„) are vectors in then the dot product (also called the 
Euclidean inner product) of u and v is denoted by u • v and is defined by 

u- v = «ivi +W2V2H- — + (17) 



In words, to calculate the dot product 
(Euclidean inner product) multiply 
corresponding components and add the 
resulting products. 

EXAMPLE 7 Calculating Dot Products Using Components ^ 

(a) Use Formula 15 to compute the dot product of the vectors u and v in Example 5. 

(b) Calculate u • v for the following vectors in 

u=(-l,3,5,7), v=(-3, -4,1,0) 

Solution 

(a) The component forms of the vectors are u = (0, 0, 1) and v = (0, 2, 2) . Thus, 

u.v=(0)(0) + (0)(2) + (l)(2) = 2 
which agrees with the result obtained geometrically in Example 5. 

(b) u-v=(-l)(-3) I (3)(-4) + (5)(l) + (7)(0)=-4 




Figure 3.2.7 



Algebraic Properties of tine Dot Product 

In the special case where u = v in Definition 4, we obtain the relationship 

vv = vJ + v| + ... + v2 = ||v||2 

This yields the following formula for expressing the length of a vector in terms of a dot product: 

llvll = 

Dot products have many of the same algebraic properties as products of real numbers. 
THEOREM 3.2.2 

If u, V, and w are vectors in R^, and if ^ is a scalar, then: 

(a) u v = v u [Symmetry prop eitj] 

(b) u • (v + w) = u • V + u ■ w [Distibutive property] 

(c) k(u • v) = (ku) • V [Homogeneity propeitj ] 

(d) V • V > 0 and v • v = 0 j/ and only if y = 0 [Positivity property] 

We will prove parts (c) and (d) and leave the other proofs as exercises. 
Proof (c) Let u = (2^ 1 , - - "m) and V = (v 1 , V2, . . v„) . Then 

jt(u- v) =k(uiv\ +«2V2 + --- + «mV„) 

= (kui)vi + (ku2)v2 + ...+ (ku„)v„ = (kn) • v 

Proof (d) The result follows from parts (a) and (b) of Theorem 3.2.1 and the fact that 



2 2 2 2 

V v = vivi +v2V2 + ... + v„v„ = Vi + ... + v„ = ||v|| 



The next theorem gives additional properties of dot products. The proofs can be obtained either by expressing 
the vectors in terms of components or by using the algebraic properties established in Theorem 3.2.2. 

n 



THEOREM 3.2.3 

Ifu, V, and ware vectors in/?", and if ^ is a scalar, then: 
(a) 0-v = vO = 0 
(^h) (u + v) • w=u • W+ V • w 
(c) u • (v — w) =u • V — u • w 
(u — v) • w=u ■ W— V • w 
(e) fc(u- v) =u- {kv) 

n 

We will show how Theorem 3.2.2 can be used to prove part {b) without breaking the vectors into components. 
The other proofs are left as exercises. 

Proof (b) 

(u + v) • w = w (u + v) [By symmetry] 

= w u + w • V [By distributivity] 

= u • w + V • w [By sjimuetiy ] 



Formulas 18 and 19 together with Theorems 3.2.2 and 3.2.3 make it possible to manipulate expressions 
involving dot products using familiar algebraic techniques. 

EXAMPLE 8 Calculating with Dot Products < 

(u-2v) • (3u + 4v) =u- (3u + 4v) -2v (3u + 4v) 

= 3(u • u) +4(u • v) - 6(v • u) - 8(v • v) 

= 3||u||2-2(u-v)-8||v||2 



Cauchy — Schwarz Inequality and Angles in 

Our next objective is to extend to the notion of "angle" between nonzero vectors u and v. We will do this 
by starting with the formula 



9= cos 



Hniiivii ) 



(20) 



which we previously derived for nonzero vectors in and Since dot products and norms have been 
defined for vectors in ^ it would seem that this formula has all the ingredients to serve as a definition of the 
angle 9 between two vectors, u and v, in However, there is a fly in the ointment, the problem being that the 
inverse cosine in Formula 20 is not defined unless its argument satisfies the inequalities 



-1< 



U • Y 

llullllvll 



<1 



(21) 



Fortunately, these inequalities do hold for all nonzero vectors in as a result of the following fimdamental 
result known as the Cauchy — Schwarz inequality. 



THEOREM 3.2.4 Cauchy— Schwarz Inequality 



□ 



Ifu= {u\,U2, ---jUn) andv= (vi, V2, v„) are vectors in then 



Iu-v|< llullllvll 



(22) 



or in terms of components 



«ivi +«2V2+"- + «mVm 



(23) 



We will omit the proof of this theorem because later in the text we will prove a more general version of which 
this will be a special case. Our goal for now will be to use this theorem to prove that the inequalities in 21 hold 
for all nonzero vectors in Once that is done we will have established all the results required to use Formula 
20 as our definition of the angle between nonzero vectors u and v in 



To prove that the inequalities in 21 hold for all nonzero vectors in R^'\ divide both sides of Formula 22 by the 
product ||u||||v|| to obtain 



llullllvll 



< 1 or equivalently 



u • V 



U V 



<1 



from which 21 follows. 




Hermann Amandus Schwarz (1843-1921) 




Viktor Yakovlevich Bunyakovsky (1804-1889) 

Historical Note The Cauchy — Schwarz inequality is named in honor of the French mathematician 
Augustin Cauchy (see p. 109) and the German mathematician Hermann Schwarz. Variations of this 
inequality occur in many different settings and under various names. Depending on the context in 
which the inequality occurs, you may find it called Cauchy's inequality, the Schwarz inequality, or 
sometimes even the Bunyakovsky inequality, in recognition of the Russian mathematician who 
published his version of the inequality in 1859, about 25 years before Schwarz. 
[Images: wikipedia (Schwarz); wikipedia (Bunyakovsky)^ 



Geometry in RP 

Earlier in this section we extended various concepts to with the idea that familiar results that we can 
visualize in and p} might be valid in as well. Here are two fundamental theorems from plane geometry 
whose validity extends to f;": 

« The sum of the lengths of two side of a triangle is at least as large as the third (Figure 3.2.8). 
« The shortest distance between two points is a straight line (Figure 3.2.9). 
The following theorem generalizes these theorems to 

□ 



THEOREM 3.2.5 

If u, V, and w are vectors in and if k is any scalar, then: 

(a) ll^ + v|| < ||u|| + ||v|| [Triangle inequality for vectors] 

d (u, v) <d (u, w) +d (w, v) [ Triangle inequality for distances ] 



Proof (a) 



0 



||u + v|| = (u + v) • (u + v) = (u • u) + 2(u • v) + (v • v) 



= ||u||2 + 2(u-v) + ||v||2 

< ||u||2 + 2|u-v| + ||v||2 

< ||u||2 + 2||u||||v|| + ||v||2 
= (INI + ||v||)2 



- Property of absolute value 
• Cauchy — Schwarz inequality 



Proof (b) It follows from part (a) and Formula 1 1 that 



^(U, V) = ||u- v|| = ll(u-w) + (w-v)ll 

< ||u-w|| + ||w-v|| =rf(u,w) +diw,v) 



U + V 




||u + v«<N + IMI 



Figure 3.2.8 




u 



d{U. V) < </(U» W) -I- (/(W, v) 

Figure 3.2.9 

It is proved in plane geometry that for any parallelogram the sum of the squares of the diagonals is equal to the 
sum of the squares of the four sides (Figure 3.2.10). The following theorem generalizes that result to 

E □ 

THEOREM 3.2.6 Parallelogram Equation for Vectors 

If u and V are vectors in then 

||u + v||2 + ||u - v||2 = 2 (||u||2 + ||v||2 j (24) 

a □ 
Proof 

||u + v||^+ ||u-v||^ =(u + v) • (u + v) + (u-v) • (u-v) 

= 2(u-u) +2(vv) 

= 2(||u||2 + ||v||2j 




U 



Figure 3.2.10 

We could state and prove many more theorems from plane geometry that generalize to R^, but the ones already 
given should suffice to convince you that is not so different from R^ and R-' even though we cannot 
visualize it directly. The next theorem establishes a fundamental relationship between the dot product and norm 
in/?". 

z ni 



THEOREM 3.2.7 

If u and V are vectors in with the Euclidean inner product, then 



u-v = ;i||u + v||2-l||u-v||2 (25) 



El 



Proof 



||u + v||2 = (u + v) • (u + v) = ||u||2 + 2(u • v) + \\vf 
||u-v||2 = (u- V) • (u- V) = ||u||2 -2CU.V) + ||v||2 



from which 25 follows by simple algebra. 



Note that Formula 25 expresses the dot product 
in terms of norms. 



Dot Products as Matrix Multiplication 

There are various ways to express the dot product of vectors using matrix notation. The formulas depend on 
whether the vectors are expressed as row matrices or column matrices. Here are the possibilities. 

If ^ is an ^ X « matrix and u and v are x 1 matrices, then it follows from the first row in Table 1 and 
properties of the transpose that 

T 

The resulting formulas 

An'Y = n' A^v (26) 



Q--4v = -4^u-v (27) 



provide an important link between multiplication by an « x « matrix A and multiplication by A • 

EXAMPLE 9 Verifying That = □ /^^v < 



Suppose that 
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-r 










-2" 


A = 
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2 -1" 
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i^v = 


-2 


4 0 


0 
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3 


1 1 


5 






-1 





from which we obtain 

^u-v =7(-2) + 10(0)+5(5) = n 
u-.4^v =(-l)(-7) I 2(4) I 4(-l) = ll 

Thus, An • \ = M. • A^v guaranteed by Formula 26. We leave it for you to verify that Formula 
27 also holds. 



Table 1 



Form 


Dot Product 








Example 












u a column matrix and 


T T 
u-v = u v = v u 






r 








'5' 






V a column matrix 




u = 




3 




u^v=[l -3 5] 


4 




= -7 










5 








0 












'5" 








r 










v = 


4 




v^u=[5 4 0] 




-3 




= -7 








0 








5 






u a row matrix and v a 


u • v = uv = v^u^ 


u = 


[1 




-3 5] 






'5' 






column matrix 






"5" 




uv=[l -3 5] 


4 




= -7 






v = 


4 








0 












0 










r 














vV = [5 4 0] 




-3 




















5 




u a column matrix and 


u • v = vu = uS'^ 






r 








r 






V a row matrix 




u = 




3 




vu=[5 4 0] 




-3 




= -7 










5 








5 










v = 


[5 


4 0] 








"5" 














uV=[l -3 


5] 


4 




















0 







Form 


Dot Product 




Example 








u a row matrix and v a 


T T 
u • V = uv = vu 


u=[l -3 5] 






J 




row matrix 




v= [5 4 0] 


uv^ = [l -3 5] 


4 


- -7 










0 














r 










vu^=[5 4 0] 




-3 


= -7 












5 





yA Dot Product View of Matrix IVIultiplication 

Dot products provide another way of thinking about matrix multiplication. Recall that if A 
matrix and B = [bjj ] is an ^ x « matrix, then the i jth entry of AB is 

anbij + ai2b2j + ••• + a^^^ 

which is the dot product of the ith row vector of A 

[an ai2 ... 

and the jth column vector of B 




Thus, if the row vectors of A are rj, r2, I'm and the column vectors of 5 are cj, C2, c„, then the matrix 
product AB can be expressed as 

AB = 



n 

Application of Dot Products to ISBN Numbers 

Although the system has recently changed, most books published in the last 25 years have been 
assigned a unique 10-digit number called an International Standard Book Number or ISBN. The first 
nine digits of this number are split into three groups — the first group representing the country or group 
of countries in which the book originates, the second identifying the publisher, and the third assigned to 
the book title itself. The tenth and final digit, called a check digit, is computed from the first nine digits 
and is used to ensure that an electronic transmission of the ISBN, say over the Internet, occurs without 
error. 

To explain how this is done, regard the first nine digits of the ISBN as a vector b in and let a be the 



= [^2;] isan^xr 



ri-ci ri-C2 
r2-ci r2-C2 



ri Cm 



(28) 



rwj ci r^ C2 • c„ 



vector 

a=(1.2, 3.4, 5. 6,7. 8, 9) 

Then the check digit c is computed using the following procedure: 

1. Form the dot product a • b- 

2. Divide a • b by 11, thereby producing a remainder c that is an integer between 0 and 10, inclusive. 
The check digit is taken to be c, with the proviso that = 10 is written as X to avoid double digits. 

For example, the ISBN of the brief edition of Calculus, sixth edition, by Howard Anton is 

0-471 - 15307-9 

which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since 
a • b = (1, 2, 3, 4, 5, 6, 1, 8, 9) • (0, 4, 7, 1, 1, 5, 3, 0, 7) = 152 

Dividing 152 by 11 produces a quotient of 13 and a remainder of 9, so the check digit is ^ = 9- If 
electronic order is placed for a book with a certain ISBN, then the warehouse can use the above 
procedure to verify that the check digit is consistent with the first nine digits, thereby reducing the 
possibility of a costly shipping error. 



Concept Review 

• Norm (or length or magnitude) of a vector 
« Unit vector 

• Normalized vector 

• Standard unit vectors 

• Distance between points in 

« Angle between two vectors in /?" 

• Dot product (or Euclidean inner product) of two vectors in 

• Cauchy-Schwarz inequality 

• Triangle inequality 

• Parallelogram equation for vectors 

Skills 

• Compute the norm of a vector in 

• Determine whether a given vector in ^ " is a unit vector. 

• Normalize a nonzero vector in 

• Determine the distance between two vectors in 

• Compute the dot product of two vectors in 

• Compute the angle between two nonzero vectors in 

• Prove basic properties pertaining to norms and dot products (Theorems 3.2.1-3.2.3 and 3.2.5-3.2.7). 



Exercise Set 3.2 



In Exercises 1-2, find the norm of v, a unit vector that has the same direction as v, and a unit vector that is 
oppositely directed to v. 

1- (a) v=(4, -3) 

(b) v=(2, 2, 2) 

(c) v= (1,0. 2, 1,3) 

Answer: 

«NI = 5.^ = (i -f).-#=(4 f) 

W||v|| = /B,^ = -^(1.0.2,1,3). = 

2- (a) v=(-5,12) 

(b) v=(l, -1.2) 

(c) v=(-2,3.3, -1) 

In Exercises 3^, evaluate the given expression with a = (2, —2,3), v=(l, —3,4), and 
«r=(3.6, -4). 

3- (a) + 

(b) ll»ll + l|v|| 

(c) ||-2u + 2v|| 

(d) ||3u-5v+w|| 

Answer: 

(a) ||u + v|| = v^ 

(b) IH| + ||v|| = /l7 + /26 

(c) ||-2u+2v|| = 2/3 

(d) ||_3u-5v+w|| = |/466 

4- (a) l|u + v+w|| 

(b) ll«-v|| 

(c) l|3v||-3||v|| 



1 1 _J_ 

/J' /!' /3, 



1 



/l5 



(1.0.2. 1,3) 



(d) \M - llvll 

In Exercises 5-6, evaluate the given expression with a = ( — 2, — 1, 4, 5), v = (3, 1, — 5, 7), and 
wr=(-6. 2, 1. 1). 

5- (a) ||3u-5v I w|| 

(b) ||3u|| - 5||v|| + ||w|| 

(c) ll-INMI 

Answer: 

(a) ||3n-5v+w|| = /2570 

(b) ||3u|| - 5||v|| + \\w\\ = 3^- 10/21 + ^ 

(c) ll-INMI = 2/966 

6. (a) ||u||-2|Iv||-3H| 

(b) HI I II - 2v|| + II - 3w|| 

(c) lll|u-vl|w|| 

7. Let V =( - 2, 3, 0, 6) . Find all scalars k such that ||;tv|| = 5. 
Answer: 

8. Let v= (1, 1, 2, — 3, 1). Find all scalars k such that ||jfcv|| =4. 

In Exercises 9-10, find u • v, u • u, and v • v- 

9- (a) u=(3,1.4). v=(2,2. -4) 

(b) u=(l,1.4,6),v=(2. -2,3. -2) 

Answer: 

(^^^ a'V= —8, u-u = 26, vv = 24 
(b) ii'V = 0, U'U = 54, vv = 21 

10- (a) a=(l,l, -2,3), v=(-l,0,5,l) 

(b) a=(2, -1, 1,0, -2), v= (1,2, 2, 2,1) 

In Exercises 1 1-12, find the Euclidean distance between u and v. 

"•(a) u=(3,3,3). v=(l,0,4) 

(b) a=(0, -2.-1. 1). v=(-3.2.4,4) 

(c) u=(3. -3. -2,0, -3,13,5), 
T=(-4, 1.-1,5,0.-11.4) 



Answer: 



(a) ||u-v|| = /m 

(b) ||u-v|| = v^ 

(c) ||u-v|| = /677 

12. (a) u=(l,2, -3,0), v=(5,l,2, -2) 

(b) u=(2, -1, -4.1,0.6, -3,1). 
▼ =(-2, -1,0,3,7,2, -5, 1) 

(c) u=(0, 1, 1, 1,2), v=(2, 1,0, -1,3) 

13. Find the cosine of the angle between the vectors in each part of Exercise 11, and then state whether the 
angle is acute, obtuse, or 90°. 

Answer: 

(a) cosfl= r^r— ; 9 is acute 



(b) costf= --^^ 



: 0 is obtuse 



0 is obtuse 



(c) costf= — L^^__ . 

14. Find the cosine of the angle between the vectors in each part of Exercise 12, and then state whether the 
angle is acute, obtuse, or 90°. 

15. Suppose that a vector a in the xy-plane has a length of 9 units and points in a direction that is 120° 
counterclockwise from the positive x-axis, and a vector b in that plane has a length of 5 units and points in 
the positives-direction. Find a ■ b- 

Answer: 

16. Suppose that a vector a in the xy-plane points in a direction that is 47° counterclockwise from the positive 
X-axis, and a vector b in that plane points in a direction that is 43° clockwise from the positive x-axis. What 
can you say about the value of a • b? 

In Exercises 17-18, determine whether the expression makes sense mathematically. If not, explain why. 

17. (a) u- (v-w) 
(b) a - (v+w) 

(c) 

(d) (u ■ v) - ||u|| 



Answer: 



(a) a • (v • w) does not make sense because v - w is ^ scalar. 

(b) a ■ (v+w) makes sense. 

(c) ||u ■ v|| does not make sense because the quantity inside the norm is a scalar. 

(d) (u ■ v) — ||u|| makes sense since the terms are both scalars. 

18- (a) Ml • 

(b) (u-v)-w 

(c) (n-v)-k 

(d) k-n 

19. Find a unit vector that has the same direction as the given vector. 

(a) (-4. -3) 

(b) (1.7) 

(c) (-3,2.^) 

(d) (1.2.3.4.5) 



Answer: 




20. Find a unit vector that is oppositely directed to the given vector. 

(a) (-12, -5) 

(b) (3. -3,-3) 

(c) (-6.8) 

(d) (-3.1.^.3) 

21. State a procedure for finding a vector of a specified length m that points in the same direction as a given 
vector V. 

22. If ||v|| = 2 and ||w|| = 3, what are the largest and smallest values possible for ||v — w||? Give a geometric 
explanation of your results. 

23. Find the cosine of the angle 0 between u and v. 

(a) u=(2.3). Y=(5. -7) 

(b) u=(-6. -2). v=(4,0) 

(c) u=(l. -5,4), v=(3,3,3) 



(d) u=(-2,2.3). Y=(1.7. -4) 



Answer: 



(a) cos£l= - 
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(b) cos0= --^ 

(c) cosfl=0 

(d) cos 0=0 

24. Find the radian measure of the angle 9 (with 0 < < w) between u and v. 

(a) (1. -7) and (21, 3) 

(b) (0, 2) and (3, - 3) 

(c) (-1, 1,0) and (0, -1,1) 

(d) (1, -1,0) and (1,0,0) 

In Exercises 25-26, verify that the Cauchy-Schwarz inequality holds. 

25. (a) u=(3,2), v=(4, -1) 

(b) u=(-3, 1,0), v=(2, -1,3) 

(c) u=(0.2.2, 1). v=(l. 1,1. 1) 

Answer: 

(a) |u-v| = 10, ||u||||v|| = /T3/i7 5« 14.866 

(b) |u-v| = 7, ||u||||v|| = /lOv^?^ 11.832 

(c) |u.v| = 5, ||u||||v|| = (3)(2) = 6 

26. (a) u=(4, 1,1), v=(l,2,3) 

(b) u= (1,2, 1,2,3), v= (0,1,1,5, -2) 

(c) u= (1,3, 5, 2, 0,1), v= (0,2,4, 1,3,5) 

27. Let pQ = (xQ, yQ, zq) and p = (^x, y, z) • Describe the set of all points {^^ z) for which ||p — poll = 1 • 
Answer: 

A sphere of radius 1 centered at (;ro, ^'o* ^o) * 

(a) Show that the components of the vector v = (vi, V2) in Figure Ex-28a are vi = ||v||cos 0 and 
V2 = ||v||sin^. 

(b) Let u and v be the vectors in Figure Ex-28Z?. Use the result in part (a) to find the components of 
4u — 5v- 



v = («>|.r-,) 




f 

I i 



/ Y 



Figure Ex-28 

29. Prove parts (a) and (b) of Theorem 3.2.1. 

30. Prove parts (a) and (c) of Theorem 3.2.3. 

31. Prove parts (d) and (e) of Theorem 3.2.3. 

32. Under what conditions will the triangle inequality (Theorem 3.2.5a) be an equality? Explain your answer 
geometrically. 

33. What can you say about two nonzero vectors, u and v, that satisfy the equation ||u + v|| = ||u|| + ||v||? 

(a) What relationship must hold for the point p = (a, b, c) to be equidistant from the origin and the 
xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 
c. 

(b) What relationship must hold for the point p = (a, b,c) to be farther from the origin than from the 
xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 
c 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer, 
(a) If each component of a vector in p} is doubled, the norm of that vector is doubled. 



Answer: 

True 

(b) In the vectors of norm 5 whose initial points are at the origin have terminal points lying on a circle of 
radius 5 centered at the origin. 

Answer: 

True 

(c) Every vector in has a positive norm. 
Answer: 



False 



(d) If V is a nonzero vector in fi", there are exactly two unit vectors that are parallel to v. 
Answer: 

True 

(e) If ||u|| = 2, ||v|| = 1, and a - v = L then the angle between u and v is a* / 3 radians. 
Answer: 

True 

(f) The expressions (u • v) + w and q • (v + w) are both meaningful and equal to each other. 
Answer: 

False 

(g) If u . v = u - then v = w. 
Answer: 

False 

(h) If u • V = 0? then either u = Q or v = 0- 
Answer: 

False 

(i) In g^, if u lies in the first quadrant and v lies in the third quadrant, then u • v cannot be positive. 
Answer: 

True 

(j) For all vectors u, v, and w in fi", we have 

||u + v+w||<||u|| + ||v|| + ||w|l 

Answer: 

True 
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3.3 Orthogonality 



In the last section we defined the notion of "angle" between vectors in R^. In this section we will focus on the notion of 
"perpendicularity." Perpendicular vectors in play an important role in a wide variety of applications. 



Recall from Formula 20 in the previous section that the angle 6 between two nonzero vectors u and v in is defned by the 
formula 



It follows from this that 9 = ir / 2 if and only if u • v = 0- Thus, we make the following definition. 

r 

DEFINITION 1 

Two nonzero vectors u and v in R^ are said to be orthogonal (or perpendicular) if u • v = 0- We will also agree that the 
zero vector 'mR^ is orthogonal to every vector inR^. A nonempty set of vectors in is called an orthogonal set if all 
pairs of distinct vectors in the set are orthogonal. An orthogonal set of unit vectors is called an orthonormal set. 

L 

EXAMPLE 1 Orthogonal Vectors A 

(a) Show that u = ( — 2, 3, 1, 4) and v = (1, 2, 0, — 1) are orthogonal vectors in R^. 

(b) Show that the set ^ = {i, j, k} of standard unit vectors is an orthogonal set in f^. 

Solution 

(a) The vectors are orthogonal since 

u-v=(-2)(l)4 (3) (2) I (1)(0) 1 (4)(-l)=0 

(b) We must show that all pairs of distinct vectors are orthogonal, that is, 

i- j = ik = ik=0 

This is evident geometrically (Figure 3.2.2), but it can be seen as well from the computations 



Orthogonal Vectors 




i.j=(l,0, 0)-(0,l,0) = 0 
i-k=(l,0, 0) • (0, 0, 1) = 0 
j-k=(0, 1,0) • (0, 0, 1)=0 



In Example 1 there is no need to check that 
ji = ki = k- j=0 



since this follows from computations in the example and 
the symmetry property of the dot product. 



Lines and Planes Determined by Points and Normals 



One learns in analytic geometry that a line in is determined uniquely by its slope and one of its points, and that a plane in /^-^ is 
determined uniquely by its "inclination" and one of its points. One way of specifying slope and inclination is to use a nonzero 
vector n, called a normal, that is orthogonal to the line or plane in question. For example, Figure 3.3.1 shows the line through the 
point PqC^O' 7o) ^^^^ normal n = {a, b) and the plane through the point PqC^O. 70» ^o) normal n = {a, b, c) . Both 

the line and the plane are represented by the vector equation 

n-P^=0 (1) 

where P is either an arbitrary point {x,y) on the line or an arbitrary point (x,y,z) in the plane. The vector p^^p can be expressed 
in terms of components as 

P^ z=z (x-XQ, y -7o) [line] 
P^ = (x-XQ, y-y(}, z-zq) [plane] 

a(7:-7:o) = 0 [line] (2) 



a(x-XQ) \ b(y -y[i) -\-c(z-Z(i) = 0 [plane] (3) 
These are called the point-normal equations of the line and plane. 

EXAMPLE 2 Point-Normal Equations A 

It follows from 2 that in p^^ the equation 

6(x-3) + O-f7) = 0 

represents the line through the point ( 3, — 7) with normal n = (6, 1); and it follows from 3 that in p^ the equation 

4(x-3) I 27-5(z-7) = 0 
represents the plane through the point (3, 0,7) with normal 11 = (4, 2, — 5) . 




When convenient, the terms in Equations 2 and 3 can be multiplied out and the constants combined. This leads to the following 
theorem. 



THEOREM 3.3.1 



(a) If a and b are constants that are not both zero, then an equation of the form 



ax+by + c = 0 



(4) 



represents a line in with normal n = (a, b). 
(b) If a, b, and c are constants that are not all zero, then an equation of the form 

£3x+6y + c:z + <i = 0 (5) 

represents a plane in /^^ with normal n = (a, b,c). 

m 

EXAMPLE 3 Vectors Orthogonal to Lines and Planes Through the Origin M 

(a) The equation ax-\-by = 0 represents a line through the origin in Show that the vector ni = (^3^, b) formed 
from the coefficients of the equation is orthogonal to the line, that is, orthogonal to every vector along the line. 

(b) The equation ax \ by \ cz =0 represents a plane through the origin in Show that the vector n2 = (a, b,c) 
formed from the coefficients of the equation is orthogonal to the plane, that is, orthogonal to every vector that 
lies in the plane. 

Solution We will solve both problems together. The two equations can be written as 

(a, b) • (x, y) = 0 and (a, b,c) - (x,y,z) =0 

or, alternatively, as 

^\'(^^y) = ^ and n2-(x,y,z)=0 
These equations show that n\ is orthogonal to every vector (x,y) on the line and that ^2 is orthogonal to every 
vector (x,y,z) in the plane (Figure 3.3.1). 



Recall that 

ax -^by = 0 and ax \ by -\-cz = 0 

are called homogeneous equations. Example 3 illustrates that homogeneous equations in two or three unknowns can be written in 
the vector form 

n-x = 0 (6) 

where n is the vector of coefficients and x is the vector of unknowns. In this is called the vector form of a line through the 
origin, and in /^^ it is called the vector form of a plane through the origin. 

Referring to Table 1 of Section 3.2, in what other ways 
can you write 6 if n and x are expressed in matrix form? 



Orthogonal Projections 

In many applications it is necessary to "decompose" a vector u into a sum of two terms, one term being a scalar multiple of a 
specified nonzero vector a and the other term being orthogonal to a. For example, if u and a are vectors in that are positioned 
so their initial points coincide at a point Q, then we can create such a decomposition as follows (Figure 3.3.2): 

• Drop a perpendicular from the tip of u to the line through a. 

• Construct the vector from Q to the foot of the perpendicular. 



• Construct the vector W2 = u—wi. 




(a) {h) (c) id) 

Figure 3.3.2 In parts (b) through (d), u = wi + W2, where vi\ is parallel to a and W2 is orthogonal to a. 

Since 

w\ -I- W2 = wj -f (u — wi ) = u 

we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar multiple of a and the second being 
orthogonal to a. 

The following theorem shows that the foregoing results, which we illustrated using vectors in apply as well in 



THEOREM 3.3.2 Projection Theorem 

If u and a are vectors in R^, and if a ;i 0, then u can be expressed in exactly one way in the form u = wi + W2, where 
is a scalar multiple of a and W2 is orthogonal to a. 

Proof Since the vector is to be a scalar multiple of a, it must have the form 

wi = Ara (7) 

Our goal is to find a value of the scalar k and a vector W2 that is orthogonal to a such that 

u = wi+W2 (8) 

We can determine k by using 7 to rewrite 8 as 

u = wi -h W2 = + W2 

and then applying Theorems 3.2.2 and 3.2.3 to obtain 

u • a = (A:a-f W2) • ^ = k\\^\?' + (w2 • a) (9) 

Since W2 is to be orthogonal to a, the last term in 9 must be 0, and hence k must satisfy the equation 

Q.a = yt||a||2 

from which we obtain 

as the only possible value for k. The proof can be completed by rewriting 8 as 

W2 = u— wi=u — ^a = u — ^ ^ a 

l|a||^ 

and then confirming that W2 is orthogonal to a by showing that W2 • a = 0 (we leave the details for you). 

The vectors and W2 in the Projection Theorem have associated names — the vector is called the orthogonal projection of u 
on SL or sometimes the vector component of u along a, and the vector W2 is called the vector component of u orthogonal to a. The 
vector wi is commonly denoted by the symbol proj^u, in which case it follows from 8 that W2 = u — proj^u. In summary. 



proj^u = ^ ^ a (vector component of u along a) 



a 



u — projj^u = u — ^ ' ^ a (vector component of u orthogonal to a) 



a 



(10) 



(11) 



EXAMPLE 4 Orthogonal Projection on a Line ^ 

Find the orthogonal projections of the vectors e\ = (1, 0) and 62 = (0, 1) on the line L that makes an angle 6 with 
the positive x-axis in 

Solution As illustrated in Figure 3.3.3, a = (cos B, sin 0) is a unit vector along the line L, so our first problem is 
to fmd the orthogonal projection of along a. Since 

||a|| = /sinV+cos^= 1 and ei • a = (1, 0) • (cos 5, smB) = cos0 

it follows from Formula 10 that this projection is 

projaCi = f a = (cos B) (cos B, sin B) = (cos^fl, sin ^cos B] 
l|a||2 ^ f 

Similarly, since e2 • a = (0, 1) • (cos 0, sin fl) = sin ff, it follows from Formula 10 that 

P^^}^^2 = ? a = (sin B) (cos B, sin B) — (sin 0, cos ^sin^6> j 
lla||2 ^ / 



EXAMPLE 5 Vector Component of u Along a M 

Let u=(2, — 1,3) and a = (4, —1,2). Find the vector component of u along a and the vector component of u 
orthogonal to a. 

Solution 

u-a =(2)(4)4(-l)(-l)-t-(3)(2) = 15 
||a||2 =42 + (- 1)^ + 2^ = 21 

Thus the vector component of u along a is 



llal 



and the vector component of u orthogonal to a is 

a-proj,u-(2, -1,3)-^^, ---j=l^-- -- —j 

As a check, you may wish to verify that the vectors u — proj^u and a are perpendicular by showing that their dot 
product is zero. 




Figure 3.3.3 



Sometimes we will be more interested in the norm of the vector component of u along a than in the vector component itself A 
formula for this norm can be derived as follows: 



llproj^ull = ll-^all = 



la|| = J^||a|| 



where the second equality follows from part (c) of Theorem 3.2.1 and the third from the fact that ||a|| > 0. Thus, 

llproJaU|| = ^^ 



(12) 



If 9 denotes the angle between u and a, then u • a = ||u|| ||a|| cos 9, so 12 can also be written as 

||proj,u|| = ||u|||costf| 

(Verify.) A geometric interpretation of this result is given in Figure 3.3.4. 




(a) o<e< ^ 




-Mlcosfi 



(13) 



(b) ^<6<7T 

Figure 3.3.4 



The Theorem of Pythagoras 

In Section 3.2 we found that many theorems about vectors in and also hold in R^. Another example of this is the following 
generalization of the Theorem of Pythagoras (Figure 3.3.5). 



THEOREM 3.3.3 Theorem of Pythagoras in R" 

If u and V are orthogonal vectors in R^^ with the Euclidean inner product, then 



||u + v||2=||u||2 + ||v||2 



(14) 



Proof Since u and v are orthogonal, we have u • v = 0? from which it follows that 



Ilu + v||2 = (u + V) • (u + V) = ||u||2 + 2(u • V) + ||u||2 + ||v||2 



EXAMPLE 6 Theorem of Pythagoras in R'* 

We showed in Example 1 that the vectors 

u=(-2,3,l,4) and v=(l,2,0, -1) 
are orthogonal. Verify the Theorem of Pythagoras for these vectors. 

Solution We leave it for you to confirm that 

u + v=(-l,5. 1,3) 
||u + v||2 = 36 



N|2+||v||2 = 30 + 6 



Thus, ||u + v||2=||u||2+||v||2 




Figure 3.3.5 



OPTIONAL 

Distance Problems 

We will now show how orthogonal projections can be used to solve the following three distance problems: 
Problem 1. Find the distance between a point and a line in 
Problem 2. Find the distance between a point and a plane in 
Problem 3. Find the distance between two parallel planes in 



A method for solving the first two problems is provided by the next theorem. Since the proofs of the two parts are similar, we will 
prove part (b) and leave part (a) as an exercise. 

I., 



THEOREM 3.3.4 

(a) In the distance D between the point Pq {^x{}, 70) ^^^^ ax by c = 0 i^ 

' I ^ 

(b) In the distance D between the point PqC^O* 70» ^o) plane + 67 + cz + = 0 is 



□ 



Proof (b) Let Q(x\, y\, zi) be any point in the plane. Position the normal n = (a, b,c) so that its initial point is at Q. As 
illustrated in Figure 3.3.6, the distance D is equal to the length of the orthogonal projection of QP^ on n. Thus, it follows from 
Formula 12 that 



But 



D=\\pro^nQP^\\ = 

2^0 = (^0-^1.70-71.^0-^1) 



llnll = fa 



Thus 



l^(^o-^i) i^0o-7i) -^g(^o-^i)| 

Since the point Q{x\,y\,z\) lies in the given plane, its coordinates satisfy the equation of that plane; thus 

ax\-\-byi~\-cz\-\-d = ^ 

or 

d = —ax\—byi—cz\ 

Substituting this expression in 17 yields 16. 



(17) 



EXAMPLE 7 Distance Between a Point and a Plane A 

Find the distance D between the point ( 1 , — 4, — 3) and the plane 2x — 3y ^(>z= — 1 • 

Solution Since the distance formulas in Theorem 3.3.4 require that the equations of the line and plane be written 
with zero on the right side, we first need to rewrite the equation of the plane as 

2x-3y^6z^ 1 = 0 

from which we obtain 

^_ |2(l)^(-3)(-4) I 6(-3) + l| _ |-3| _3 



^0^ ^ 



proj„ QPq 




Distance from to plane. 



Figure 3.3.6 

The third distance problem posed above is to find the distance between two parallel planes in As suggested in Figure 3.3.7, the 



distance between a plane V and a plane W can be obtained by finding any point Pq in one of the planes, and computing the 
distance between that point and the other plane. Here is an example. 




Figure 3.3.7 The distance between the parallel planes Fand ^is equal to the distance between Pq and W. 

EXAMPLE 8 Distance Between Parallel Planes M 

The planes 

x~\-2y-2z = 3 3nd2x-\-4y-4z = l 

are parallel since their normals, (1,2, — 2) and (2, 4, — 4), are parallel vectors. Find the distance between these 
planes. 

Solution To fmd the distance D between the planes, we can select an arbitrary point in one of the planes and 
compute its distance to the other plane. By setting y = ^ = 0 in the equation x | 2^ — 2z = 3? we obtain the point 
Fo(3, 0, 0) in this plane. From 16, the distance between Fq and the plane 2x 4^ — 4z = 7 is 

^ 12(3)4 4(0) I (-4)(0)^7| i 

j/22-i-4^4(-4)2 ^ 



Concept Review 

• Orthogonal (perpendicular) vectors 

• Orthogonal set of vectors 

• Normal to a line 

• Normal to a plane 

• Point-normal equations 

• Vector form of a line 

• Vector form of a plane 

• Orthogonal projection of u on a 

• Vector component of u along a 

• Vector component of u orthogonal to a 

• Theorem of Pythagoras 

Skills 

• Determine whether two vectors are orthogonal. 

• Determine whether a given set of vectors forms an orthogonal set. 

• Find equations for lines (or planes) by using a normal vector and a point on the line (or plane). 

• Find the vector form of a line or plane through the origin. 

• Compute the vector component of u along a and orthogonal to a. 



• Find the distance between a point and a line in or Z^-^. 

• Find the distance between two parallel planes in 

• Find the distance between a point and a plane. 



Exercise Set 3.3 

In Exercises 1-2, determine whether u and v are orthogonal vectors. 

1. (a) u=(6, 1,4), v=(2,0, -3) 

(b) u=(0,0. -1). v= (1,1.1) 

(c) u=(-6,0,4). v=(3,l,6) 

(d) u=(2,4. -8). v-(5,3.7) 

Answer: 

(a) Orthogonal 

(b) Not orthogonal 

(c) Not orthogonal 

(d) Not orthogonal 

2. (a) u=(2,3). v=(5, -7) 

(b) u=(-6, -2), v=(4,0) 

(c) u=(l, -5,4), v=(3,3,3) 

(d) u=(-2,2,3), v=(l,7, -4) 

In Exercises 3-4, determine whether the vectors form an orthogonal set. 

3- (a) vi = (2,3),V2 = (3,2) 

(b) vi = (-l, 1),V2 = (1,1) 

(c) VI = ( - 2, 1, 1), V2 = (1, 0, 2), V3 = ( - 2, - 5, 1) 

(d) VI = (-3,4, -1),V2=(1,2,5),V3=(4, -3,0) 

Answer: 

(a) Not an orthogonal set 

(b) Orthogonal set 

(c) Orthogonal set 

(d) Not an orthogonal set 

4- (a) vi = (2,3),V2 = (-3,2) 

(b) VI = (1, -2),V2=(-2, 1) 

(c) VI = (1,0, 1),V2=(1, 1, 1),V3 = (-1,0, 1) 

(d) VI = (2, - 2, 1), V2 = (2, 1, - 2), V3 = (1, 2, 2) 

5. Find a unit vector that is orthogonal to both u = ( 1 , 0, 1 ) and v = (0, 



Answer: 



^[k h'~h] 

^' (a) Show that v = (a, b) and Hr= ( — i, a) are orthogonal vectors. 

(b) Use the resuh in part (a) to find two vectors that are orthogonal to v = (2, — 3) . 

(c) Find two unit vectors that are orthogonal to ( — 3, 4) . 

7. Do the points A{1, 1, 1), fi( — 2, 0, 3), and C( — 3, —1,1) form the vertices of a right triangle? Explain your answer. 
Answer: 

Yes 

8. Repeat Exercise 7 for the points A(3, 0, 2), fl(4, 3, 0), and C(8, 1, - 1). 

In Exercises 9-12, find a point-normal form of the equation of the plane passing through P and having n as a normal. 

9. F(-1.3. -2); ii=(-2. 1. -1) 
Answer: 

-2(x + l) + O-3)-(z + 2) = 0 

10. P(1,1,4); n=(l,9,8) 

11. ^(2,0, 0); n=(0,0, 2) 

Answer: 
2z = 0 

12. ^(0,0,0); n=(l,2,3) 

In Exercises 13-16, determine whether the given planes are parallel. 
13. 47r -^^ H- 2z = 5 and?^: - 3;)/ + 4z = 8 
Answer: 

Not parallel 

14.;r_4;/-3z-2 = 0and3:r-12;/-9z-7 = 0 
15. 2;)/ = 8j: -4zH- 5 and?: = -^^H- ^J^' 

Answer: 

Parallel 

16. ( -4. 1. 2) • ix.y.z) = 0 and (8. - 2. -4) • ix,y,z) = 0 

In Exercises 17-18, determine whether the given planes are perpendicular. 

17.37r-;/H-z-4 = 0, X'¥'2z= -1 

Answer: 

Not perpendicular 

18. ;c-2>^-f 3z = 4, - 2;^ I 5^ +4z= - 1 
In Exercises 19-20, find ||proj^u||. 

19. (a) u=(l. -2), a=(-4. -3) 
(b) u=(3.0,4). a=(2.3,3) 



Answer: 



(a) I 

(b) -lL 

20. (a) u=(5,6), a =(2, -1) 

(b) u=(3. -2.6). a=(1.2. -7) 

In Exercises 21-28, find the vector component of u along a and the vector component of u orthogonal to a. 

21. u=(6.2). a =(3. -9) 
Answer: 

(0, 0) (6, 2) 

22. u=(-l. -2). a=(-2.3) 

23. u=(3.1. -7). a=(1.0.5) 

Answer: 

\ 13' • 13/ U3 13 J 

24. u= (1,0,0), a=(4,3,8) 

25. u=(l.l, l).a=(0,2. -1) 

Answer: 

(o.|,-l),(,.|,f) 

26. u=(2, 0,l),a=(l,2, 3) 

27. a= (2. 1.1. 2), a =(4. -4.2. -2) 

Answer: 

(1 i _L _LWi i -2- llA 
^5' ~5' 10' "10/ \5' 5' 10' 10 J 

28. u=(5.0. -3.7). a= (2. 1. - 1. - 1) 

In Exercises 29-32, find the distance between the point and the line. 

29.41 + 3^+4 = 0; (-3. 1) 

Answer: 

1 

30. X - 3j + 2 = 0, ( - 1, 4) 

31. y= -4x + 2; (2. -5) 

Answer: 
1 

/I? 

32. 3x I y = 5; (1, 8) 

In Exercises 33-36, find the distance between the point and the plane. 



33.(3, 1, - 2),x + 2y-2z = 4 



Answer: 

5 
3 

34. (-1, - 1, 2), 2x + 5>'-6z = 4 

35. (-\,2,\);2x + 3y-4z=\ 

Answer: 
1 

36. (0,3, -2); x-7-z = 3 

In Exercises 37^0, find the distance between the given parallel planes. 

37. 2x - y - z = 5 and -4x + 2>' + 2z = 12 
Answer: 

11 

fe 

38. 3x-4y\z= 1 and 6;v - 87 + 2z= 3 

39. -4x I y - 3z = 0 SindSx -2y -\- 6z = 0 

Answer: 

0 (The planes coincide.) 

40. 2x-7+z= 1 and2;r-7+z= - 1 

41. Let i, j, and k be unit vectors along the positive x, y, and z axes of a rectangular coordinate system in 3 -space. If v = (a, b, 
is a nonzero vector, then the angles a, p, and y between v and the vectors i, j, and k, respectively, are called the direction 
angles of v (Figure Ex-41), and the numbers cos a, cos ^, and cos 7 are called the direction cosines of v. 

(a) Show that cos Ci = a / || v|| . 

(b) Find cos J and cos 7. 

(c) Show that v / ||v|| = (cos a, cos /3, cos 7). 

(d) Show that cos^a + cos^^3 + cos^ = 1 • 




Figure Ex-41 



Answer: 



l|v|| ||v|| 

42. Use the result in Exercise 41 to estimate, to the nearest degree, the angles that a diagonal of a box with dimensions 



10 cm X 15 cm X 25 cm makes with the edges of the box. 

43. Show that if v is orthogonal to both and W2, then v is orthogonal to k]yir\ + ^2'^2 scalars ki and k2- 

44. Let u and v be nonzero vectors in 2- or 3-space, and let k = ||u|| and / = ||v|| . Show that the vector w = iu~\~kv bisects the 
angle between u and v. 

45. Prove part (a) of Theorem 3.3.4. 

46. Is it possible to have 

proj^u = proj^a ? 

Explain your reasoning. 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The vectors ( 3, —1,2) and (0, 0, 0) are orthogonal. 
Answer: 

True 

(b) If u and v are orthogonal vectors, then for all nonzero scalars k and m, /tu and are orthogonal vectors. 
Answer: 

True 

(c) The orthogonal projection of u along a is perpendicular to the vector component of u orthogonal to a. 
Answer: 

True 

(d) If a and b are orthogonal vectors, then for every nonzero vector u, we have 

proja(projb(u)) = 0 

Answer: 

True 

(e) If a and u are nonzero vectors, then 

proja(proja(u)) =proja(u) 

Answer: 

True 

(f) If the relationship 

projaU = projj^v 

holds for some nonzero vector a, then a = v- 
Answer: 

False 

(g) For all vectors u and v, it is true that 

ll« + v|| = ||u|| + ||v|| 

Answer: 

False 
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3.4 The Geometry of Linear Systems 



In this section we will use parametric and vector methods to study general systems of linear equations. This work will enable us to interpret 
solution sets of linear systems with n unknowns as geometric objects inR^ just as we interpreted solution sets of linear systems with two 
and three unknowns as points, lines, and planes in R^ and R^. 



In the last section we derived equations of lines and planes that are determined by a point and a normal vector. However, there are other 
useful ways of specifying lines and planes. For example, a unique line in or is determined by a point xq on the line and a nonzero 
vector V parallel to the line, and a unique plane in is determined by a point xg in the plane and two noncollinear vectors and V2 
parallel to the plane. The best way to visualize this is to translate the vectors so their initial points are at xq (Figure 3.4.1). 



Let us begin by deriving an equation for the line L that contains the point xq and is parallel to v. If x is a general point on such a line, then, 
as illustrated in Figure 3.4.2, the vector x — xq will be some scalar multiple of v, say 

X — XQ = or equivalently x = xq I tv 
As the variable t (called a parameter) varies from — (xj to qq, the point x traces out the line L. Accordingly, we have the following result. 

n FT 

THEOREM 3.4.1 

Let L be the line in or that contains the point xq and is parallel to the nonzero vector v Then the equation of the line through 
XQ that is parallel to v is 



Vector and Parametric Equations of Lines in and 




Figure 3.4.1 



(1) 



If XQ = 0, then the line passes through the origin and the equation has the form 



x = tv 



(2) 



Although it is not stated explicitly, it is understood in 
Formulas 1 and 2 that the parameter t varies from — (X) to oo- 
This applies to all vector and parametric equations in this text 
except where stated otherwise. 



Figure 3.4.2 



Vector and Parametric Equations of Planes in R 

Next we will derive an equation for the plane ^that contains the point xg and is parallel to the noncollinear vectors v\ and V2- As shown in 
Figure 3.4.3, if x is any point in the plane, then by forming suitable scalar multiples of v\ and V2, say ^ and ^2V2' we can create a 
parallelogram with diagonal x — xq and adjacent sides tivi and ^2^2- Thus, we have 

X — xo=^lvi I t2V2 or eqmvalen^iy X = X{] \ t\vi I ^2^2 




Figure 3.4.3 

As the variables ^1 and ^2 (called parameters) vary independently from —00 to r>c , the point x varies over the entire plane W. Accordingly, 
we make the following definition. 



THEOREM 3.4.2 

Let ^be the plane inp^^ that contains the point xq and is parallel to the noncollinear vectors and V2. Then an equation of the 
plane through xq that is parallel to and V2 is given by 



x = xo + iivi +t2V2 



(3) 



If XQ = 0, then the plane passes through the origin and the equation has the form 

x = ^ivi +^2V2 



(4) 



Remark Observe that the line through xq represented by Equation 1 is the translation by xq of the line through the origin represented by 
Equation 2 and that the plane through xq represented by Equation 3 is the translation by xq of the plane through the origin represented by 
Equation 4 (Figure 3.4.4). 



Figure 3.4.4 



Motivated by the forms of Formulas 1 to 4, we can extend the notions of Hne and plane to by making the following definitions. 

r ~i 

DEFINITION 1 

If XQ and V are vectors in and if v is nonzero, then the equation 

x = xo + iv (5) 

defines the line through xq that is parallel to y- In the special case where xq = 0, the line is said to pass through the origin. 
L J 
r n 

DEFINITION 2 

If xq, vi , and V2 are vectors in and if and V2 are not collinear, then the equation 

x = xo + ^lvi +^2V2 (6) 

defines the plane through xq that is parallel tovi and V2- In the special case where xq = 0, the plane is said to pass through the 
origin. 



Equations 5 and 6 are called vector forms of a line and plane inR^.lf the vectors in these equations are expressed in terms of their 
components and the corresponding components on each side are equated, then the resulting equations are called parametric equations of 
the line and plane. Here are some examples. 

EXAMPLE 1 Vector and Parametric Equations of Lines in and ^ 

(a) Find a vector equation and parametric equations of the line in R^ that passes through the origin and is parallel to the 
vector V = ( — 2, 3). 

(b) Find a vector equation and parametric equations of the line in that passes through the point Pq{\,2, — 3) and is 
parallel to the vector v = (4, — 5, 1 ) . 

(c) Use the vector equation obtained in part (b) to find two points on the line that are different from Pq. 

Solution 

(a) It follows from 5 with xq = 0 that a vector equation of the line is x = tv- If we let x = (p^,y)^ then this equation can be 
expressed in vector form as 

(x.y)=t(-2, 3) 

Equating corresponding components on the two sides of this equation yields the parametric equations 



x= - 2t, y = 3t 

(b) It follows from 5 that a vector equation of the line is x = xq I tv. If we let x = {x,y,z), and if we take 
xo = (l,2, — 3), then this equation can be expressed in vector form as 

{x,y,z) = {\,2,-3) \t(A,-5,\) (7) 

Equating corresponding components on the two sides of this equation yields the parametric equations 

x = \^At, y = 2-5t, z= - 3 + ^ 

(c) A point on the line represented by Equation 7 can be obtained by substituting a specific numerical value for the 
parameter t . However, since i — 0 produces {x, y,z) = {\,2, — 3)? which is the point Pq? this value of t does not serve 
our purpose. Taking i = \ produces the point (5, — 3, =2) and taking ^ = — 1 produces the point ( — 3,7, — 4) . Any 
other distinct values for t (except i = 0) would work just as well. 



EXAMPLE 2 Vector and Parametric Equations of a Plane in R ^ 

Find vector and parametric equations of the plane x ^y ^2z=5- 

Solution We will find the parametric equations first. We can do this by solving the equation for any one of the variables in 
terms of the other two and then using those two variables as parameters. For example, solving for x in terms of and z yields 

x = 5'^y-2z (8) 



and then using j and z as parameters ti and t2, respectively, yields the parametric equations 

x = 5=\rt\-2t2. y=t\, z = t2 



We would have obtained different parametric and 
vector equations in Example 2 had we solved 8 for or 
z rather than x. However, one can show the same plane 
results in all three cases as the parameters vary from 
— cx) to KJ. 



To obtain a vector equation of the plane we rewrite these parametric equations as 

(x^y^z) = (5 4 ^1 -2/2,^1,^2) 

or, equivalently, as 

(x^y^z) = (5, 0, 0) -h^i(l, 1, 0) -} t2( - 2, 0, 1) 



EXAMPLE 3 Vector and Parametric Equations of Lines and Planes in 

(a) Find vector and parametric equations of the line through the origin of that is parallel to the vector v=(5, — 3,6,1). 

(b) Find vector and parametric equations of the plane in that passes through the point xo=(2, —1,0,3) and is parallel 
to both VI = (1, 5, 2, -4) andv2= (0,7, -8, 6). 

Solution 

(a) If we let X = (t: 1 , ;r2, :^3, :^ 4) , then the vector equation x = can be expressed as 

(x[,X2.X2,X4) =t(5, -3, 6, 1) 
Equating corresponding components yields the parametric equations 

^1=5^, X2= ^3t, X2 = 6t, X4 = t 



(b) The vector equation x = xq + ^ ivj + ^2^2 expressed as 

(xuX2,X3,X4) = (2, -1,0,3) 1 ^i(l,5, 2, -4) f ^2(0,7, -8,6) 
which yields the parametric equations 

7:1 = 2 + ^1 

X2 = 2t\ — St2 
7:4=3-4^1 + 6t2 



Lines Through Two Points in 



If XQ and XI are distinct points in R^, then the hne determined by these points is parallel to the vector v = xj — xq (Figure 3.4.5), so it 
follows from 5 that the line can be expressed in vector form as 

x = xo + ^(xi -xo) (9) 

or, equivalently, as 

x= (1 -OxQ + ^xi (10) 
These are called the two-point vector equations of a line mR^. 

EXAMPLE 4 A Line Through Two Points in ^ 

Find vector and parametric equations for the line mp} that passes through the points P(0,1) and g(5, 0) . 

Solution We will see below that it does not matter which point we take to be xq and which we take to be xj , so let us 
choose XQ = (0, 7) and xi = (5, 0). It follows that xj — xq = (5, — 7) and hence that 

(?:.y) = (0.1)+t(5. -1) (11) 

which we can rewrite in parametric form as 

x = 5t, y = l -It 

Had we reversed our choices and taken xg = (5, 0) and xj = (0, 7), then the resulting vector equation would have been 

(x,y) = (5,0) \ t(-5.7) (12) 
and the parametric equations would have been 

x = 5-5t,y = lt 

(verify). Although 11 and 12 look different, they both represent the line whose equation in rectangular coordinates is 

7x I 57 = 35 

(Figure 3.4.6). This can be seen by eliminating the parameter t from the parametric equations (verify). 




Figure 3.4.5 



Figure 3.4.6 



The point x = (?^,y) Equations 9 and 10 traces an entire line in as the parameter t varies over the interval ( — oo, oc) . If, however, 
we restrict the parameter to vary from ^ = Q to ^ = ] , then x will not trace the entire line but rather just the line segment joining the points 
Xq and . The point x will start at xq when i = Q and end at xj when ^ = 1 . Accordingly, we make the following definition, 
r n 

DEFINITION 3 

If XQ and XI are vectors in R^, then the equation 

x = xo + ^(xi -xo) (0<^< 1) (13) 
defines the line segment from xq to x^ . When convenient. Equation 13 can be written as 

x=(l-Oxo + ^xi (0<i:<l) (14) 



EXAMPLE 5 A Line Segment from One Point to Another in 



It follows from 13 and 14 that the line segment in from xo = (l, — 3)toxi = (5, 6) can be represented either by the 
equation 

x=(l, -3)-f ^(4, 9) (0<^<1) 

or by 

x= (1-0(1, -3)+^(5,6) (0<^<1) 



Dot Product Form of a Linear System 

Our next objective is to show how to express linear equations and linear systems in dot product notation. This will lead us to some 
important results about orthogonality and linear systems. 

Recall that a linear equation in the variables x\,X2y ..-.^yi has the form 

a\X\ +i3t2^2 ^ ...^CLn^n =^ ('^h ^2, all zero) (15) 

and that the corresponding homogeneous equation is 

a\x\ +<32^2 + + = ^ {^\> all zero) (16) 

These equations can be rewritten in vector form by letting 



in which case Formula 1 5 can be written as 



a-x = 6 (17) 

and Formula 16 as 

a-x = 0 (18) 



Except for a notational change from n to a, Formula 18 is the extension to of Formula 6 in Section 3.3. This equation reveals that each 
solution vector x of a homogeneous equation is orthogonal to the coefficient vector a. To take this geometric observation a step further, 
consider the homogeneous system 

^11^1 ^ ^12^2 + - + <^\n^n = 0 
^21^1 ^22^2 + --- + ^In^yi = ^ 

^ml^l » ^^2^2 + + ^mn^n = ^ 



If we denote the successive row vectors of the coefficient matrix by ri , r2, - - ., r^, then we can rewrite this system in dot product form as 

ri • X = 0 

r2-x = 0 ^^^^ 
• X = 0 

from which we see that every solution vector x is orthogonal to every row vector of the coefficient matrix. In summary, we have the 
following result. 



THEOREM 3.4.3 

If Aism^xn matrix, then the solution set of the homogeneous linear system Ax = 0 consists of all vectors inR^ that are 
orthogonal to every row vector of A. 



EXAMPLE 6 Orthogonality of Row Vectors and Solution Vectors M 

We showed in Example 6 of Section 1 .2 that the general solution of the homogeneous linear system 
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is 

Ki= ^3r — As —2t, X2 = r, X2= —2s, X4 = s, x^ = t, = 0 

which we can rewrite in vector form as 

^={~3r-As-2t,r, -2s, s, 1,0) 

According to Theorem 3.4.3, the vector x must be orthogonal to each of the row vectors 

ri = (l,3, -2,0,2,0) 
r2 = (2, 6, -5, -2,4, -3) 
r3=(0, 0, 5, 10, 0, 15) 
r4=(2, 6, 0, 8, 4, 18) 



We will confirm that x is orthogonal to r i , and leave it for you to verify that x is orthogonal to the other three row vectors as 
well. The dot product of and x is 

ri -x^ 1(- 3^-4^-20 -f 3(r) -f ( - 2) ( - 2s) + 0(s) + 2(0 + 0(0) = 0 
which establishes the orthogonality. 



The Relationship Between Ax = 0 and Ax = b 



We will conclude this section by exploring the relationship between the solutions of a homogeneous linear system = 0 the solutions 
(if any) of a nonhomogeneous linear system ^ = b that has the same coefficient matrix. These are called corresponding linear systems. 

To motivate the result we are seeking, let us compare the solutions of the corresponding linear systems 
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We showed in Example 5 and Example 6 of Section 1.2 that the general solutions of these linear systems can be written in parametric form 
as 

homogeneous —¥xi= —3r — 4s — 2t, X2 = r, X2= — 2s, X4 = s, x^=t, x^ = 0 

nonhomogeneous —► 7: 1 = —3^ — 4^—2/, X2 = r, 7:3= —2s, X4 = s, x^ = t, x^ = -^ 

which we can then rewrite in vector form as 

homogeneous — ► (7:1, X2. 7:3, X4, x^) = ( — 3r — 4s — 2t, r, — 2s, s, t, 0) 

nonhomogeneous — ► (7:1, X2, X2, X4, x^) = l^ — 3r — 4s — 2t, r, — 2s, s, t, j 

By splitting the vectors on the right apart and collecting terms with like parameters, we can rewrite these equations as 



homogeneous-^(xi,;c2, ^3, ^4,^5) =^(-3, 1,0,0,0) l ^(-4, 0, -2, 1,0,0) ) /(-2, 0, 0, 0, 1,0) 
tioth.omogtTito\xs^{x\,X2.X2,X4,x^)=r{-3, 1, 0, 0, 0) } ^(-4, 0, -2, 1, 0, 0) 4- ^(-2, 0, 0, 0, 1, 0) + |o, 0, 0, 0, 0, 



(20) 



(21) 



Formulas 20 and 21 reveal that each solution of the nonhomogeneous system can be obtained by adding the fixed vector ^0, 0, 0, 0, 0, — 
to the corresponding solution of the homogeneous system. This is a special case of the following general result. 



THEOREM 3.4.4 

The general solution of a consistent linear system Ax. = h can be obtained by adding any specific solution of ^ = b to the general 
solution of ^ = 0- 



Proof Let xg be any specific solution of ^ = b, let ^denote the solution set of ^ =r 0? and let xq I W denote the set of all vectors that 
result by adding xq to each vector in W. We must show that if x is a vector in xq I W, then x is a solution of ^ = b? and conversely, that 
every solution of ^ = b is in the set xq I W. 

Assume first that x is a vector in xg + . This implies that x is expressible in the form x = xq + w, where = b and Av= 0 . Thus, 

i4ic = j4(xo+w) =-^^o + -^ = *' + 0 = b 

which shows that x is a solution of ^ = b . 



Conversely, let x be any solution of ^ = b . To show that x is in the set xg I WwQ must show that x is expressible in the form 



(22) 



where w is in ^ (i.e., = Q)- We can do this by taking w = x — xq . This vector obviously satisfies 22, and it is in W since 

Aw=A(x-xq) =Axi-Ax:Q = h-h = 0 

A\ = b 




Figure 3.4.7 The solution set of ^ = b is a translation of the solution space of ^ = 0- 



Remark Theorem 3.4.4 has a useful geometric interpretation that is illustrated in Figure 3.4.7. If, as discussed in Section 3. 1, we interpret 
vector addition as translation, then the theorem states that if xg is any specific solution of = b? then the entire solution set of ^ = b can 
be obtained by translating the solution set of ^ = Q by the vector xq . 



Concept Review 

• Parameters 

• Parametric equations of lines 

• Parametric equations of planes 

• Two-point vector equations of a line 

• Vector equation of a line 

• Vector equation of a plane 

Skills 

• Express the equations of lines in ^ and ^ using either vector or parametric equations. 

• Express the equations of planes mB^ using either vector or parametric equations. 

• Express the equation of a line containing two given points ox^ using either vector or parametric equations. 

• Find equations of a line and a line segment. 

• Verify the orthogonality of the row vectors of a linear system of equations and a solution vector. 

• Use a specific solution to the nonhomogeneous linear system ^ = b and the general solution of the corresponding linear system 
^ = 0 to obtain the general solution to ^ = b- 



Exercise Set 3.4 

In Exercises 1-4, find vector and parametric equations of the line containing the point and parallel to the vector. 
1. Point: ( - 4, 1); vector: v = (0, - 8) 
Answer: 

Vector equation: 7) = ( - 4, 1) -h ^(0, -8); 
parametric equations: x= — A, y = \ — St 



2. Point: (2, - 1 ) ; vector: v = ( - 4, - 2) 

3. Point: (0, 0, 0); vector: v = ( - 3, 0, 1) 

Answer: 

Vector equation: (x,y,z) =t(~ 3,0, 1)1 

parametric equations: x = —3t, y = 0, z = t 

4. Point: ( - 9, 3, 4) ; vector: v = ( - 1, 6, 0) 

In Exercises 5-8, use the given equation of a line to find a point on the line and a vector parallel to the line. 

5. x=(3-5£, -6-0 
Answer: 

Point: (3, — 6); parallel vector: ( — 5, — 1) 

7. x= (1-0(4, 6)+^(-2,0) 

Answer: 

Point: (4, 6); parallel vector: (—6, — 6) 

8. x= (1-0(0, -5. 1) 

In Exercises 9-12, find vector and parametric equations of the plane containing the given point and parallel vectors. 

9. Point: ( - 3, 1, 0); vectors: vi = (0, - 3, 6) and V2 = ( - 5, 1, 2) 
Answer: 

Vector equation: (x, z) = ( - 3, 1, 0) + (0, - 3, 6) H- i!2( - 5, 1, 2); 

parametric equations: y = - 3 - 5^2, = 1 - 3i i H- £2, ^ = 6i i H- 2^2 

10. Point: (0, 6, - 2); vectors: vi = (0, 9, - 1) and V2 = (0, - 3, 0) 

11. Point: (-1,1,4); vectors: vi = (6, - 1, 0) and V2 = ( - 1, 3, 1) 

Answer: 

Vector equation: (x,y,z) = ( - 1, 1,4) H-^i(6, - 1, 0) •^t2(~h 3, 1); 

parametric equations: x= = 1 -f- 6^ 1 - i52, 7 = 1-^1 + 3^2, z = 4 + i52 

12. Point: (0,5, - 4); vectors: vi = (0, 0, -5)andv2=(l, -3, -2) 

In Exercises 13-14, fmd vector and parametric equations of the line in /j2 that passes through the origin and is orthogonal to v. 

13. v=(-2, 3) 
Answer: 

A possible answer is vector equation: (x, y) =t(3, 2) ; 

parametric equations: x = 3t, y = 2t 

14. v=(l, -4) 

In Exercises 15-16, find vector and parametric equations of the plane in /J^ that passes through the origin and is orthogonal to v. 

15. v=(4,0, — 5) [Hint: Construct two nonparallel vectors orthogonal to v in /^^J. 
Answer: 



A possible answer is vector equation: (x, j^, z) = i i (0, 1, 0) + ^2(5, 0, 4) \ 



parametric equations: x + 5t2, y = t\,z = 4t2 

16. v=(3, 1. -6) 

In Exercises 17-20, find the general solution to the linear system and confirm that the row vectors of the coefficient matrix are orthogonal 
to the solution vectors. 

17. 7:1 + 7:2+ 7:3 = 0 
27:1 + 27:2 + 27:3 = 0 
37:1 + 37:2 + 37:3 = 0 



Answer: 

7:1= ^s — t, 7:2 = 5, X2 = t 

18. 7:1 + 37:2-47:3 = 0 
27:1+6:1:2-87:3 = 0 

19. -J^l + 5:^2 + ^3 + 27:4 — 7:5 = 0 
TTi - 27:2 -:r3 + 37:4+2x5 = 0 



Answer: 

3 19 2 2 13 

xi = Y-^^^Y' ^2= -y/' + ys+y^, X2 — r, 7:4 = 5, 7:5 = ^ 

20.^^1 + 37:2-4x3 = 0 
7:1 } 27:2 I 37:3 = 0 

(a) The equation x \ y I z = 1 can be viewed as a linear system of one equation in three unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 

(b) Give a geometric interpretation of the result in part (a). 

Answer: 

(a) (1,0, 0) I ^(-1,1,0)1 1,0,1) 

(b) a plane in passing through P(l, 0, 0) and parallel to ( — 1, 1, 0) and ( — 1, 0, 1) 

(a) The equation x I y = 1 can be viewed as a linear system of one equation in two unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 

(b) Give a geometric interpretation of the result in part (a). 

(a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in that are 
orthogonal to a= (1, 1, 1) and b = ( - 2, 3, 0). 

(b) What kind of geometric object is the solution space? 

(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 
Answer: 

(a) + 7 + z = 0 
-27: 3y =0 

(b) a line through the origin in /J^ 

(a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in that are 
orthogonal to a= ( - 3, 2, - 1) and b = (0, - 2, - 2). 

(b) What kind of geometric object is the solution space? 



(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 
25. Consider the linear systems 



and 
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-1" 
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6 
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-2 
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-2 
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^3 




0 



"3 2-1" 
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6 4-2 






4 




'^3 




-2 



(a) Find a general solution of the homogeneous system. 

(b) Confirm that 7:1 = 1,7:2 = 0, 7:3 = 1 is a solution of the nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomogeneous system directly. 



Answer: 



2 1 

^- ^1 = — ys4- j^, X2=s, X2=t 



26. Consider the linear systems 



and 
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-3" 
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^2 
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-7 
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^3 
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"^1" 
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^2 
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-7 
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-1 



(a) Find a general solution of the homogeneous system. 

(b) Confirm that ;^ j = \^ X2= 1,7:3 = 1 is a solution of the nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomogeneous system directly. 

In Exercises 27-28, find a general solution of the system, and use that solution to find a general solution of the associated homogeneous 
system and a particular solution of the given system. 



27. 



3 4 12 
6 8 2 5 
9 12 3 10 

Answer: 



x\ = — — -^t, X2 = s, X2 = t, X4= \ ; The general solution of the associated homogeneous system is 

x\= — j/, X2=s, X2=t. 7:4=0. A particular solution of the given system is 7:1 = 7:2 = 0, 7:3 = 0, 7:4=!. 



28. 



9-356 
6-231 
3 -1 3 14 



True-False Exercises 



In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 



(a) The vector equation of a line can be determined from any point lying on the line and a nonzero vector parallel to the line. 
Answer: 

True 

(b) The vector equation of a plane can be determined from any point lying in the plane and a nonzero vector parallel to the plane. 
Answer: 

False 

(c) The points lying on a line through the origin in /j2 or /j3 ^re all scalar multiples of any nonzero vector on the line. 
Answer: 

True 

(d) All solution vectors of the linear system j^x = b ^re orthogonal to the row vectors of the matrix A if and only if b = 0- 
Answer: 

True 

(e) The general solution of the nonhomogeneous linear system ^ = b can be obtained by adding b to the general solution of the 
homogeneous linear system ^ = 0- 

Answer: 

False 

(f) If XI and X2 are two solutions of the nonhomogeneous linear system Ax, = h^ then xi — X2 is a solution of the corresponding 
homogeneous linear system. 

Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



3.5 Cross Product 



This optional section is concerned with properties of vectors in 3 -space that are important to physicists and 
engineers. It can be omitted, if desired, since subsequent sections do not depend on its content. Among other 
things, we define an operation that provides a way of constructing a vector in 3 -space that is perpendicular to two 
given vectors, and we give a geometric interpretation of 3 x 3 determinants. 



Cross Product of Vectors 

In Section 3.2 we defined the dot product of two vectors u and v in w-space. That operation produced a scalar as its 
result. We will now define a type of vector multiplication that produces a vector as the result but which is 
applicable only to vectors in 3-space. 

r n 



DEFINITION 1 

If u = («!, «2» "3) V = (vi, V2, V3) are vectors in 3-space, then the cross product u x v is the vector 
defined by 

axv = (^2^3- "3^2, "3^1 -«iV3, «iV2 -^2^1) 

or, in determinant notation. 



ux v = 



(U2 U3 




«1 U2 




"1 "2 


[v2 V3 




VI V3 




VI V2 



(1) 



Remark Instead of memorizing 1, you can obtain the components of u x v follows: 

whose first row contains the components of u and whose second row 



Form the 2x3 matrix 



ti2 "3 
V2 V3 

contains the components of v. 

« To find the first component of u x delete the first column and take the determinant; to find the second 
component, delete the second column and take the negative of the determinant; and to find the third component, 
delete the third column and take the determinant. 



EXAMPLE 1 Calculating a Cross Product < 

Find u X V. where u=(l,2, — 2) and v = (3, 0, 1 ) . 

Solution From either 1 or the mnemonic in the preceding remark, we have 



ux V = 



(2 -2 




1 -2 




1 2 


[0 1 


9 


3 1 


9 


3 0 



= (2, -7. -6) 



The following theorem gives some important relationships between the dot product and cross product and also 
shows that u x v is orthogonal to both u and v. 

Historical Note The cross product notation A'xB was introduced by the American physicist and 
mathematician J. Willard Gibbs, (see p. 134) in a series of unpublished lecture notes for his students at Yale 
University. It appeared in a published work for the first time in the second edition of the book Vector 
Analysis, (Edwin Wilson) by Edwin Wilson (1879—1964), a student of Gibbs. Gibbs originally referred to 
i4 X 5 as the "skew product." 



THEOREM 3.5.1 Relationships Involving Cross Product and Dot Product 

If u, V, and w are vectors in 3 -space, then 





u • (u X v) = 0 






(u X V is orthogonal to u) 




u • (u X v) = 0 






(u X V iff orthogonal to v) 


(0 


||uxv||2 = ||u||2| 


|v||2-(u- 




{Lagrange ' s identity) 


{d) 


ux (vxw) = (u 


• w)v — (u 


• v)w 


{relationship between cross and dot products) 


(^) 


(ux v) xw= (u 


• w)v — (v 


• w)u 


{relationship between cross and dot products) 



Proof (a) Let u= (wj, ^3) and v= (vi, V2, V3). Then 

u- (uxv) = («i,«2. "3) • («2V3-"3V2»"3V1 - W1V3, W1V2 - W2V1) 

= «l(w2V3 -W3V2) +^2(^3^1 — «1V3) +ti2{u\V2—U2V\) =0 

Proof (b) Similar to (a). 
Proof (c) Since 

||u X v|| ^ = {ii2V3 - "3V2) ^ + ("3V 1 - " 1 V3) ^ + 1 V2 - "2^ 1 ) ^ (2) 

and 

Hull ^ II v|| 2 _ (u ■ v) 2 = ^ ^ ) (v J + + v| ) ^ (z^ 1 V 1 + 2^2V2 + «3V3) ^ (3) 

the proof can be completed by "multiplying out" the right sides of 2 and 3 and verifying their equality. 
Proof (d) and (e) See Exercises 38 and 39. 



EXAMPLE 2 u X V Is Perpendicular to u and to V M 

Consider the vectors 

u=(1.2, -2) and v=(3,0,l) 

In Example 1 we showed that 

axv=(2, -7, -6) 

Since 

u-(uxv) = (l)(2) + (2)(-7) + (-2)(-6)=0 

and 

V • (u X V) = (3) (2) + (0) ( - 7) + ( 1) ( - 6) = 0 

u X V is orthogonal to both u and v, as guaranteed by Theorem 3.5.1. 



4 



Joseph Louis Lagrange (1736-1813) 

Historical Note Joseph Louis Lagrange was a French-Italian mathematician and astronomer. Although 
his father wanted him to become a lawyer, Lagrange was attracted to mathematics and astronomy after 
reading a memoir by the astronomer Halley. At age 16 he began to study mathematics on his own and by 
age 19 was appointed to a professorship at the Royal Artillery School in Turin. The following year he 
solved some famous problems using new methods that eventually blossomed into a branch of mathematics 
called the calculus of variations. These methods and Lagrange's applications of them to problems in 
celestial mechanics were so monumental that by age 25 he was regarded by many of his contemporaries as 
the greatest living mathematician. One of Lagrange's most famous works is a memoir, Mecanique 
Analytique, in which he reduced the theory of mechanics to a few general formulas from which all other 
necessary equations could be derived. Napoleon was a great admirer of Lagrange and showered him with 
many honors. In spite of his fame, Lagrange was a shy and modest man. On his death, he was buried with 
honor in the Pantheon. 
[Image: ©SSPL/The Image Works'] 



The main arithmetic properties of the cross product are listed in the next theorem. 



THEOREM 3.5.2 Properties of Cross Product 



If u, V, and w are any vectors in 3-space and k is any scalar, then: 
(a) uxv= -(vxu) 

u X (v 4= w) = (u X v) + (u X w) 
(^c^ (u + v) xw= (uxw) + (vxw) 

(d) 't(u X v) = {hi) X V = u X {krv) 

(e) uxO = Oxu = 0 

(f) uxu = 0 

The proofs follow immediately from Formula 1 and properties of determinants; for example, part {a) can be proved 
as follows. 

Proof (a) Interchanging u and v in 1 interchanges the rows of the three determinants on the right side of 1 and 
hence changes the sign of each component in the cross product. Thus u x v = —(vxu). 

The proofs of the remaining parts are left as exercises. 

EXAMPLE 3 Standard Unit Vectors A 



Consider the vectors 

i= (1,0,0), j= (0,1,0), k= (0,0,1) 

These vectors each have length 1 and lie along the coordinate axes (Figure 3.5.1). They are called the 
standard unit vectors in 3-space. Every vector v = (vi, V2, V3) in 3-space is expressible in terms of 
i, j, and k since we can write 

v= (vi, V2, V3) = vi(l, 0, 0) + V2(0, 1, 0) + V3(0, 0, 1) = vii + V2j + V3k 

For example, 

(2, -3,4) = 2i-3] + 4k 

From 1 we obtain 



ixj = 



0 0 

1 0 



1 0 
0 0 



1 0 
0 1 



= (0, 0.1) =k 




(0, 1, 0) 



Figure 3.5.1 The standard unit vectors 



You should have no trouble obtaining the following results: 

ixi = 0 jxj = 0 kxk = 0 

ix] = k jxk = i kxi = j 

jxi==k kxj=— i ixk=— j 

Figure 3.5.2 is helpful for remembering these results. Referring to this diagram, the cross product of two 
consecutive vectors going clockwise is the next vector around, and the cross product of two consecutive vectors 
going counterclockwise is the negative of the next vector around. 



i 




Figure 3.5.2 



Determinant Form of Cross Product 

It is also worth noting that a cross product can be represented symbolically in the form 

i j k 



ax v = 



ui U2 U2 
VI V2 V3 



U2 U2 
V2 V3 



1 — 



VI V3 



] + 



VI V2 



For example, if u = (1, 2, — 2) and v = (3, 0, 1), then 

i j k 

uxv= 1 2 -2 

3 0 1 

which agrees with the result obtained in Example 1. 



= 2i-7]-6k 



(4) 



WARNING 

It is not true in general that u x (v x w) = (u x v) x w. For example, 

ix(jxj)=ixO = 0 

and 

(ixj)xj = kxj= -i 

so 

ix(jxj)5fe(ixj)x] 



We know from Theorem 3.5.1 that u x v is orthogonal to both u and v. If u and v are nonzero vectors, it can be 
shown that the direction of u x v can be determined using the following "right-hand rule" (Figure 3.5.3): Let 9 be 



the angle between u and v, and suppose u is rotated through the angle 9 until it coincides with v. If the fingers of 
the right hand are cupped so that they point in the direction of rotation, then the thumb indicates (roughly) the 
direction of u x v- 

11 X V 




V 



Figure 3.5.3 

You may find it instructive to practice this rule with the products 

ix] = k, jxk = i, kxi = ] 



Geometric Interpretation of Cross Product 

If u and V are vectors in 3 -space, then the norm of u x v has a useful geometric interpretation. Lagrange's identity, 
given in Theorem 3.5.1, states that 

||uxv||2 = ||u||2||v||2-(u-v)2 (5) 

If 9 denotes the angle between u and v, then u - v = ||u|| ||v||cos 0, so 5 can be rewritten as 

||UXV||2 =||u||2||v||2-||u||2||v||Wfl 
= INP||v||2(1-cos29) 

= ||u||2||v||We 

Since 0 < < ir, it follows that sin S > 0, so this can be rewritten as 

||uxv|| = ||u|||M|sm«f (6) 

But ||v||sin 0 is the altitude of the parallelogram determined by u and v (Figure 3.5.4). Thus, from 6, the area A of 
this parallelogram is given by 

A= (base) (altitude) = ||u||||v||sin0= ||ux v|| 

This result is even correct if u and v are collinear, since the parallelogram determined by u and v has zero area and 
from 6 we have u x v = 0 because Q = Om this case. Thus we have the following theorem. 

ki 

THEOREM 3.5.3 Area of a Parallelogram 

If, u and V are vectors in 3 -space, then ||u x v|| is equal to the area of the parallelogram determined by u 
and V. 



EXAMPLE 4 Area of a Triangle M 

Find the area of the triangle determined by the points P\ (2, 2, 0), P2( — ^, 0, 2), and ^3(0, 4, 3). 

Solution The area A of the triangle is the area of the parallelogram determined by the vectors 
P1P2 ^iid P^P2 (Figure 3.5.5). Using the method discussed in Example 1 of Section 3.1, 
p^2=(-3, -2, 2) andp^3=(_2, 2, 3) • It follows that 

pIKxPIP3 = (-10. 5. -10) 

(verify) and consequently that 



DEFINITION 2 

If u, V, and w are vectors in 3 -space, then 

u • (vxw) 

is called the scalar triple product of u, v, and w. 




Figure 3.5.5 



The scalar triple product of u = (mj, U2, «3), v = (vj, V2, V3), and w= (w\, W2, ^3) can be calculated from the 
formula 



This follows from Formula 4 since 

u • (vxw) 



u • (vxw) = 



"1 U2 U2 
VI V2 V3 
wi W2 >V3 



= u ' 



/ V2 


V3 


i — 




V3 


(w2 


W3 






•^3 



V2 V3 
M'2 W2 



ui - 



VI V3 



1 + 
U2 + 



VI V2 
VI V2 



"3 



«1 U2 U2 
VI V2 V3 
wi W2 yv3 



EXAMPLES Calculating a Scalar Triple Product M 

Calculate the scalar triple product u • (vxw) of the vectors 

u=3i-2j-5k, v = i + 4j-4k. w=3j + 2k 



(V) 



Solution From 7, 



u • (vxw) = 



3 -2 -5 

1 4 -4 

0 3 2 



= 3 



-(-2) 



4 -4 
3 2 

60 + 4-15 = 49 



-4 

2 



-K-5) 



1 4 

0 3 



Remark The symbol (u - v) x w makes no sense because we cannot form the cross product of a scalar and a 
vector. Thus, no ambiguity arises if we write u • v x w rather than u • (vxw). However, for clarity we will usually 
keep the parentheses. 



It follows from 7 that 

u • (v x w) = w • (u X v) = v • (w X u) 

since the 3 x 3 determinants that represent these products can be obtained from one another by two row 
interchanges. (Verify.) These relationships can be remembered by moving the vectors u, v, and w clockwise around 
the vertices of the triangle in Figure 3.5.6. 




Figure 3.5.6 



Geometric Interpretation of Determinants 

The next theorem provides a useful geometric interpretation of 2 x 2 ^^id 3x3 determinants. 



THEOREM 3.5.4 



(a) The absolute value of the determinant 



[vi V2j 



is equal to the area of the parallelogram in 2-space determined by the vectors u = 2^2) ™d 
v= (vi, V2)' (See Figure 3.5.7a.) 

(b) The absolute value of the determinant 



det 



VI V2 V3 
w\ vi?2 >V3 



is equal to the volume of the parallelepiped in 3-space determined by the vectors 11= (u\,U2, W3), 
v= (vi, V2, V3), and w= vt?2, vi?3). (See Figure 3.5.7b.) 




(a) 



► 



(b) 

Figure 3.5.7 




Proof (a) The key to the proof is to use Theorem 3.5.3. However, that theorem applies to vectors in 3-space, 
whereas u = {u\, U2) and v = (vi, V2) are vectors in 2-space. To circumvent this "dimension problem," we will 
view u and v as vectors in the xy-plane of an xyz-coordinate system (Figure 3.5.7c), in which case these vectors are 
expressed as u= U2, 0) and v= {v\, V2, 0). Thus 



QX V = 



i j k 

ui U2 0 

VI V2 0 





"2 


k=det 


"«1 


"2" 




V2 




V2_ 



It now follows from Theorem 3.5.3 and the fact that ||k|| = 1 that the area A of the parallelogram determined by u 
and V is 



^=||uxv|| = ||det["[ 



"2 
V2 



k|| = 



det 



"1 «2 
VI V2 



llkll = 



det 



"1 "2 
VI V2 



which completes the proof. 



Proof (b) As shown in Figure 3.5.8, take the base of the parallelepiped determined by u, v, and w to be the 
parallelogram determined by v and w. It follows from Theorem 3.5.3 that the area of the base is ||v x w|| and, as 
illustrated in Figure 3.5.8, the height h of the parallelepiped is the length of the orthogonal projection of u on v xw 
. Therefore, by Formula 12 of Section 3.3, 



1. II II (vxw)| 

* = llprowill = ' iiyxwil 



It follows that the volume V of the parallelepiped is 

|u • ( V X w) I 

V = (area of base) • height = ||v x wH-"— jj jj— = 



so from 7, 



which completes the proof 



vxw 



u • (vxw) 
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«3 
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V2 


V3 
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W3 



(8) 




Figure 3.5.8 



Remark If K denotes the volume of the parallelepiped determined by vectors u, v, and w, then it follows from 
Formulas 7 and 8 that 



volume of parallelepiped 
determined by u, v, and w 



u • (vxw) 



(9) 



From this result and the discussion immediately following Definition 3 of Section 3.2, we can conclude that 

u • ( V X w) = ± 

where the + or - results depending on whether u makes an acute or an obtuse angle with vxw- 



Formula 9 leads to a useful test for ascertaining whether three given vectors lie in the same plane. Since three 

vectors not in the same plane determine a parallelepiped of positive volume, it follows from 9 that 

|u • (v X w) I = 0 if and only if the vectors u, v, and w lie in the same plane. Thus we have the following result. 



THEOREM 3.5.5 



If the vectors u = (wj, U2, 2^3), v = (v^, V2, V3), and w = (wj, >i?2, VV3) have the same initial point, then 
they lie in the same plane if and only if 

u\ U2 U2 
u- (vxw) = VI V2 V3 =0 

w\ W2 



Concept Review 

• Cross product of two vectors 

• Determinant form of cross product 

• Scalar triple product 

Skills 

• Compute the cross product of two vectors u and v in /J^. 

• Know the geometric relationship between u x v to u and v. 

• Know the properties of the cross product (listed in Theorem 3.5.2). 

• Compute the scalar triple product of three vectors in 3 -space. 

• Know the geometric interpretation of the scalar triple product. 

• Compute the areas of triangles and parallelograms determined by two vectors or three points in 2-space 
or 3 -space. 

• Use the scalar triple product to determine whether three given vectors in 3 -space are collinear. 



Exercise Set 3.5 

In Exercises 1-2, let u = (3, 2, ^1), v = (0, 2, — 3), and w= (2, 6, 7). Compute the indicated vectors. 

l.(a) vxw 

(b) ux (vxw) 

(c) (uxv) xw 

Answer: 



(a) (32. -6.-4) 

(b) (-14,-20.-82) 

(c) (27.40. -42) 

2. (^a) (u X v) X (v X w) 

(b) u X (v - 2w) 

(c) («xv)-2w 

In Exercises 3-6, use the cross product to find a vector that is orthogonal to both u and v. 

3. u=(-6.4.2),v=(3. 1.5) 
Answer: 

(18,36, -18) 

4. u=(l,l, -2),v=(2, -1,2) 

5. u=(-2. 1.5).v=(3.0. -3) 

Answer: 

(-3.9. -3) 

6. u=(3.3. l).v=(0.4.2) 

In Exercises 7-10, find the area of the parallelogram determined by the given vectors u and v. 

7. u=(l. -1.2).v=(0.3. 1) 
Answer: 

{59 

8. u=(3, -l,4),v=(6, -2,8) 

9. u=(2,3.0).v=(-1.2, -2) 

Answer: 

/ioT 

10. a =(1. 1. l).v=(3.2. -5) 

In Exercises 11-12, find the area of the parallelogram with the given vertices. 

11. Fi(l. 2). ^2(4, 4). P3(7. 5), 3) 
Answer: 

3 

12. Pi (3. 2), ^2(5. 4), P3(9, A), PaP. 2) 

In Exercises 13-14, find the area of the triangle with the given vertices. 
13.i4(2.0).5(3.4).C(-l,2) 
Answer: 



7 

14. ^(1, 1),5(2,2),C(3, -3) 

In Exercises 15-16, find the area of the triangle in 3-space that has the given vertices. 

15. Pi(2. 6. -1).P2(1. 1. l).i^3(4.6.2) 
Answer: 

l/374 
2 

16. PO. -1.2).(2(0,3.4).J?(6, 1,8) 

In Exercises 17-18, find the volume of the parallelepiped with sides u, v, and w. 

17. a=(2, -6,2),v=(0,4, -2).w=(2.2, -4) 
Answer: 

16 

18. a = (3, 1, 2). v= (4, 5, 1). w= (1, 2, 4) 

In Exercises 19-20, determine whether u, v, and w lie in the same plane when positioned so that their initial 
points coincide. 

19. u=(-l, -2,l).v=(3,0, -2),w=(5. -4,0) 

Answer: 

The vectors do not lie in the same plane. 

20. tt=(5. -2. 1).Y=(4, -1. l).w=(l. -1,0) 

In Exercises 21-24, compute the scalar triple product u • (v x w) . 

21. u=(-2.0.6). v=(l, -3.1), w=(-5, -1.1) 
Answer: 

-92 

22.ii=(-1.2.4), v=(3,4, -2), w=(-l,2,5) 

23. a=(fl,0,0), v=(0,2>,0), w=(0,0.c) 

Answer: 

abc 

24. u= (3, -1.6). v=(2.4.3). w=(5. -1.2) 

In Exercises 25-26, suppose that a • (v x w) = 3. Find 

25. (a) u- (wxv) 

(b) (vxw) -u 

(c) w (uxv) 



Answer: 



(a) -3 

(b) 3 

(c) 3 

26. (^^^ V • (u X w) 

(b) (uxw) -v 

(c) V- (wxw) 

(a) Find the area of the triangle having vertices .4(1, 0, 1), 5(0, 2, 3), and C(2, 1,0). 

(b) Use the result of part (a) to find the length of the altitude from vertex C to side AB. 

Answer: 

(a) jg6 

2 

(b) j^26 

3 

28. Use the cross product to find the sine of the angle between the vectors u = (2, 3, — 6) and v = (2, 3, 6) . 

29. Simplify (u + v) x (u - v) . 

Answer: 
2(vxu) 

30. Let a= a2, as), b = b2, As)' c = (ci, C2, ci), and d= (rfi, d2, di). Show that 

(aH- d) - (b X c) = a - (b X c) H- d - (b X c) 

31. Let u, V, and w be nonzero vectors in 3 -space with the same initial point, but such that no two of them are 
collinear. Show that 

(a) a X (v X w) lies in the plane determined by v and w. 

(b) (u X v) xw Hes in the plane determined by u and v. 

32. Prove the following identities. 

(a) (u + ifcv) xv = uxv 

(b) a ■ (v X z) = — (u X z) ■ V 

33. Prove: If a, b, c, and d lie in the same plane, then (a x b) x (c x d) = 0. 

34. Prove: If 9 is the angle between u and v and u - v ^ 0? then tanO = ||u x v|| / (u - v) . 

35. Show that if u, v, and w are vectors in , no two of which are collinear, then u x (v x w) lies in the plane 
determined by v and w. 

36. It is a theorem of solid geometry that the volume of a tetrahedron is -^(area of base) • (height). Use this result 



to prove that the volume of a tetrahedron whose sides are the vectors a, b, and c is 

0 

accompanying figure). 



a - (b X c) 



(see the 




Figure Ex-36 

37. Use the result of Exercise 26 to find the volume of the tetrahedron with vertices P, Q, R, S. 

(a) P{-\,2.^),Q{2,\, - 3),R{\,\A),S{3, -2.3) 

(b) m 0. 0), Q{\. 2, - 1). R{3. A, 0). S{-\.- 3. 4) 

Answer: 

(a) n. 

6 

(b) i 
2 

38. Prove part {d) of Theorem 3.5.1. [Hint: First prove the result in the case where = (1, 0, 0), then when 
iv=]=(0, 1,0), and then when w=k=(0,0, 1). Finally, prove it for an arbitrary vector w = (vvi,>V27>^3) 
by writing w = w\\ + + wsk..] 

39. Prove part {e) of Theorem 3.5.1. [Hint: Apply part {a) of Theorem 3.5.2 to the result in part {d) of Theorem 
3.5.1.] 

40. Prove: 

(a) Prove (Z?) of Theorem 3.5.2. 

(b) Prove (c) of Theorem 3.5.2. 

(c) Prove ((f) of Theorem 3.5.2. 

(d) Prove (e) of Theorem 3.5.2. 

(e) Prove (/) of Theorem 3.5.2. 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) The cross product of two nonzero vectors u and v is a nonzero vector if and only if u and v are not parallel. 
Answer: 

True 

(b) A normal vector to a plane can be obtained by taking the cross product of two nonzero and noncollinear vectors 
lying in the plane. 

Answer: 

True 

(c) The scalar triple product of u, v, and w determines a vector whose length is equal to the volume of the 
parallelepiped determined by u, v, and w. 



Answer: 

False 

(d) If u and v are vectors in 3-space, then ||v x u|| is equal to the area of the parallelogram determined by u and v. 
Answer: 

True 

(e) For all vectors u, v, and w in 3-space, the vectors (u x v) x w and u x (v x w) are the same. 
Answer: 

False 

(f) If u, V, and w are vectors in /J^, where u is nonzero and u x v = u x then v = w. 
Answer: 

False 
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Chapter 3 Supplementary Exercises 

l.Letu= (-2, 0,4),v= (3, - 1, 6), and w= (2, -5, -5). Compute 

(a) 3v-2u 

(b) ||u + v+w|| 

(c) the distance between _3ii and v + 5w 

(d) Projwu 

(e) u- (vxw)} 

(f) ( - 5v + w) X ((u ■ v)w) 



(e) u- (vxw) = - 122 

(-5v+w)x((u-v)w) = (-3150. -2430,1170) 

2. Repeat Exercise 1 for the vectors u = 3i — 5j + k, v = — 2i + 2k, and w = — j 4 4k. 

3. Repeat parts (a)-(d) of Exercise 1 for the vectors u = ( — 2, 6, 2, 1), v = ( — 3, 0, 8, 0), and 
«r=(9,l, -6. -6). 



(a) 3v-2u=(-5, -12,20, -2) 

(b) ||u + v+w|| = vri06 

(c) /28T0 

(d) projwU= ~^(9.h -e. -6) 

4. Repeat parts (a)-(d) of Exercise 1 for the vectors u = (0, 5, 0, — 1, — 2), v = (1, —1,6, — 2, 0), and 
w=(-4, -1,4, 0, 2). 



In Exercises 5-6, determine whether the given set of vectors forms an orthogonal set. If so, normalize each 
vector to form an orthonormal set. 



Answer: 



(a) 3v-2u=(13, -3, 10) 

(b) ||u + v+w|| = /70 



(c) {774 




Answer: 



5. (-32, -1,19), (3, -1,5), (1,6, 2) 



Answer: 



Not an orthogonal set 



6. (-2.0, 1,2),(1, -5.2) 

^' (a) The set of all vectors in that are orthogonal to a nonzero vector is what kind of geometric object? 

(b) The set of all vectors in /J^ that are orthogonal to a nonzero vector is what kind of geometric object? 

(c) The set of all vectors in /J^ that are orthogonal to two noncollinear vectors is what kind of geometric 
object? 

(d) The set of all vectors in that are orthogonal to two noncollinear vectors is what kind of geometric 
object? 

Answer: 

(a) A line through the origin, perpendicular to the given vector. 

(b) A plane through the origin, perpendicular to the given vector. 

(c) {0} (the origin) 

(d) A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 

^' Show that VI = ^ j and V2 = -j, — j j are orthonormal vectors, and find a third vector V3 for 

which {vi, V2, V3} is an orthonormal set. 

9. True or False: If u and v are nonzero vectors such that ||u + v|| ^ = ||u|| ^ + || v|| ^, then u and v are 
orthogonal. 

Answer: 

True 

10. True or False: If u is orthogonal to v + then u is orthogonal to v and w. 

11. Consider the points P(3, -1,4), g(6, 0, 2), and jR(5, 1,1). Find the point S in whose first 
component is —1 and such that PQ is parallel to 

Answer: 
S'(-l, -1,5) 

12. Consider the points P( - 3, 1, 0, 6), g(0, 5, 1, - 2), and fi( - 4, 1, 4, 0). Find the point S in whose 
third component is 6 and such that PQ is parallel to ^ . 

13. Using the points in Exercise 11, find the cosine of the angle between the vectors PQ and 
Answer: 



T 17 



14. Using the points in Exercise 12, find the cosine of the angle between the vectors PQ and p^. 

15. Find the distance between the point P( — 3, 1,3) and the plane 5x +z = 3_y — 4. 

Answer: 



11 

16. Show that the planes 3x + 6z = 7 and ^Sx + 2y — 12z = 1 ^i*^ parallel, and find the distance 
between the planes. 

In Exercises 17-22, find vector and parametric equations for the line or plane in question. 

17. The plane in that contains the points F( - 2, 1, 3), g( - 1, - 1, 1), and R(3, 0, - 2). 
Answer: 

Vectorequation: (;t,;/,z) = (-2, l,3)+ii(l, -2, -2) +^2(5, -1, -5); 

parametric equations: y = - 2 + i i + 5^2, y = \- 2ti-l2, z = 3 - 2« i - 5^2 

18. The line in that contains the point P( — 1, 6, 0) and is orthogonal to the plane 4x —z = 5' 

19. The line in /J^ that is parallel to the vector v = (8, — 1) and contains the point P(0, — 3). 

Answer: 

Vector equation: (x,y) = (0, - 3) +^(8, - 1); 

parametric equations: x = Bt, y= — 3 — i 

20. The plane in ^ that contains the point P( — 2, 1,0) and parallel to the plane — 8;r + 6^ — z = 4- 

21. The line in /J^ with equation y = 3;c — 5- 

Answer: 

A possible answer is vector equation: {x^y) = (0, — 5) 3); parametric equations: 

x = t, y= - 5 + 3^ 

22. The plane in with equation 2;ic — 67 + 3z = 5- 

In Exercises 23-25, find a point-normal equation for the given plane. 

23. The plane that is represented by the vector equation 
(x,;;.^) = (-1.5.6)+ii(0, -1.3)+i2(2, -1.0). 

Answer: 

3(^ + 1) + 60-5) + 2(z-6) = 0 

24. The plane that contains the point F( — 5, 1,0) and is orthogonal to the line with parametric equations 
I = 3 - 5/, y = 2i, and ^ = 7. 

25. The plane that passes through the points P(9, 0, 4), (2( - 1, 4, 3), and fi(0, 6, - 2). 
Answer: 

-18(x - 9) - 5\y - 24(z- 4) = 0 



26. Suppose that {vi, V2, V3) and {wi, W2} are two sets of vectors such that Vf and wj are orthogonal for 
all / and j. Prove that if fli, <33, ii, ^2 ^^Y scalars, then the vectors v = aivi + a2V2 + a3V3 and 
w=biwi H-b2W2 are orthogonal. 

27. Prove that if two vectors u and v in ^ are orthogonal to a nonzero vector w in /J^^ then u and v are scalar 
multiples of each other. 

28. Prove that ||u + v|| = ||u|| + || v|| if and only if u and v are parallel vectors. 

29. The equation Ax^By = 0 represents a line through the origin in ^ if A and B are not both zero. What 
does this equation represent in if you think of it as Ax^By + Oz = 0? Explain. 

Answer: 

A plane 
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CHAPTER I 

^ General Vector Spaces 
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INTRODUCTION 



Recall that we began our study of vectors by viewing them as directed line segments 
(arrows). We then extended this idea by introducing rectangular coordinate systems, which 
enabled us to view vectors as ordered pairs and ordered triples of real numbers. As we 
developed properties of these vectors we noticed patterns in various formulas that enabled 
us to extend the notion of a vector to an w-tuple of real numbers. Although w-tuples took 
us outside the realm of our "visual experience," it gave us a valuable tool for 
understanding and studying systems of linear equations. In this chapter we will extend the 
concept of a vector yet again by using the most important algebraic properties of vectors 
in as axioms. These axioms, if satisfied by a set of objects, will enable us to think of 
those objects as vectors. 
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4.1 Real Vector Spaces 

In this section we will extend the concept of a vector by using the basic properties of vectors in as axioms, which if satisfied 
by a set of objects, guarantee that those objects behave like familiar vectors. 



Vector Space Axioms 

The following definition consists often axioms, eight of which are properties of vectors in that were stated in Theorem 3.1.1. 
It is important to keep in mind that one does not prove axioms; rather, they are assumptions that serve as the starting point for 
proving theorems. 



Vector space scalars can be real numbers or complex 
numbers. Vector spaces with real scalars are called real 
vector spaces and those with complex scalars are called 
complex vector spaces. For now we will be concerned 
exclusively with real vector spaces. We will consider 
complex vector spaces later. 



DEFINITION 1 



Let Fbe an arbitrary nonempty set of obj ects on which two operations are defined: addition, and multiplication by 
scalars. By addition we mean a rule for associating with each pair of objects u and v in Fan object u | v, called the 
sum of u and v; by scalar multiplication we mean a rule for associating with each scalar k and each object u in Fan 
object ku, called the scalar multiple of u by k. If the following axioms are satisfied by all objects u, v, w in Fand all 
scalars k and m, then we call V di vector space and we call the objects in V vectors. 

1. If u and V are objects in F, then u | v is in F. 

2. u4v = v4u 

3. u4= (v + w) = (u + v) +w 

4. There is an object 0 in F, called a zero vector for F, such that 0 + u = u + 0 = u for all u in F. 

5. For each u in F, there is an object _u in F, called a negative of u, such that u-h ( — u) = (— u)+u = 0. 

6. If k is any scalar and u is any object in F, then ku is in F. 

7. fc(u-|- v) = Au-h Av 

8. {k m)\\. = All + mw. 

9. k{m\SL) — {km) (u) 

10. lu = u 

L J 



Observe that the definition of a vector space does not specify the nature of the vectors or the operations. Any kind of object can 
be a vector, and the operations of addition and scalar multiplication need not have any relationship to those onR^. The only 
requirement is that the ten vector space axioms be satisfied. In the examples that follow we will use four basic steps to show 
that a set with two operations is a vector space. 

r n 



To Show that a Set with Two Operations is a Vector Space 

Step 1 Identify the set Fof objects that will become vectors. 



step 2 Identify the addition and scalar multiplication operations on V. 

Step 3 Verify Axioms 1 and 6; that is, adding two vectors in F produces a vector in V, and multiplying a vector in Vhy 
a scalar also produces a vector in V. Axiom 1 is called closure under addition, and Axiom 6 is called closure under 
scalar multiplication. 

Step 4 Confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. 

L 




r 



Hermann Giinther Grassmann (1809-1877) 

Historical Note The notion of an "abstract vector space" evolved over many years and had many contributors. The 
idea crystallized with the work of the German mathematician H. G. Grassmann, who published a paper in 1 862 in which 
he considered abstract systems of unspecified elements on which he defined formal operations of addition and scalar 
multiplication. Grassmann's work was controversial, and others, including Augustin Cauchy (p. 137), laid reasonable 
claim to the idea. 

[Image: (c)Sueddeutsche Zeitung Photo/The Image Works] 



Our first example is the simplest of all vector spaces in that it contains only one object. Since Axiom 4 requires that every 
vector space contain a zero vector, the object will have to be that vector. 

EXAMPLE 1 The Zero Vector Space < 

Let F consist of a single object, which we denote by 0, and define 

0 I 0 = 0 andAO=0 

for all scalars k. It is easy to check that all the vector space axioms are satisfied. We call this the zero vector 
space. 



Our second example is one of the most important of all vector spaces — the familiar space fC^.lX should not be surprising that 
the operations on satisfy the vector space axioms because those axioms were based on known properties of operations on 

EXAMPLE 2 Is a Vector Space < 

Let V = R^, and define the vector space operations on Vio be the usual operations of addition and scalar 
multipHcation of ^-tuples; that is, 

u I V = (2^1, 2^2, + (vi, V2,.--, v„) = («i + vi,«2-I-V2,---,"m + v„) 

The set ^ = i?" is closed under addition and scalar multiplication because the foregoing operations produce 



^-tuples as their end result, and these operations satisfy Axioms 2, 3, 4, 5, 7, 8, 9, and 10 by virtue of Theorem 
3.1.1. 



Our next example is a generalization of in which we allow vectors to have infinitely many components. 

EXAMPLE 3 The Vector Space of Infinite Sequences of Real Numbers M 

Let F consist of objects of the form 

u= («i,«2, - 

in which U2, - . z^vj, - . - is an infinite sequence of real numbers. We define two infinite sequences to be equal if 
their corresponding components are equal, and we define addition and scalar multiplication componentwise by 

u + v = (uuU2.....Uyi,...) -f (vi, V2,..-, v„...) 
= (ui + VI, «2 + V2, Uy, -h v„, ...) 
jfcu = (ku\, ku2, ..JcUyi, ...) 

We leave it as an exercise to confirm that Fwith these operations is a vector space. We will denote this vector 
space by the symbol R ^ . 



In the next example our vectors will be matrices. This may be a little confusing at first because matrices are composed of rows 
and columns, which are themselves vectors (row vectors and column vectors). However, here we will not be concerned with the 
individual rows and columns but rather with the properties of the matrix operations as they relate to the matrix as a whole. 

Note that Equation 1 involves three different addition 
operations: the addition operation on vectors, the 
addition operation on matrices, and the addition 
operation on real numbers. 



EXAMPLE 4 AVectorSpaceof 2 X 2 Matrices M 



Let Fbe the set of 2 x 2 matrices with real entries, and take the vector space operations on Vto be the usual 
operations of matrix addition and scalar multiplication; that is, 

"2^11 -hvii W12+V12" 
"21 + V21 ^22 V22 



u=H v = 



■"11 


"12" 




"vil 


V12" 




"21 


"22 




_V21 


V22_ 





hi = k 



""11 


"12" 




"21 


"22 _ 





ku2\ ku22 



(1) 



The set Fis closed under addition and scalar multiplication because the foregoing operations produce 2x2 
matrices as the end result. Thus, it remains to confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. Some of these 
are standard properties of matrix operations. For example. Axiom 2 follows from Theorem 1.4.1a since 



aH- v = 



"11 

"21 



"12 

"22 



vil 

V21 



V12 

V22 



vil 
V21 



V12 
V22 



-h 



"11 
"21 



"12 

"22 



= vH-u 



Similarly, Axioms 3, 7, 8, and 9 follow from parts (A), (/), and {e), respectively, of that theorem (verify). This 
leaves Axioms 4, 5, and 10 that remain to be verified. 



= 0 + ufor all2x2 



With this definition, 



OH-u = 



"0 


0" 




■"11 


"12" 




■"11 


"12" 


0 


0_ 




"21 


"22 _ 




"21 


"22 _ 



= u 



and similarly q 4= 0 = u- To verify that Axiom 5 holds we must show that each object u in Fhas a negative _u in 
V such that u + ( — u) = 0 and ( — u) + u = 0. This can be done by defining the negative of u to be 

'-"11 -"12" 
-"21 -"22 



— u = 



With this definition, 

u+ (-u) = 

and similarly ( — u) + u = 0. Finally, Axiom 10 holds because 

lu=l 



""11 


"12" 




-f 


"21 


"22 _ 



"11 -"12" 




"0 


0" 


"21 -"22 




0 


0_ 



= 0 



""11 


"12" 




""11 


"12" 


_"21 


"22_ 




"21 


"22 _ 



= u 



EXAMPLE 5 The Vector Space of /77 X A7 Matrices M 

Example 4 is a special case of a more general class of vector spaces. You should have no trouble adapting the 
argument used in that example to show that the set F of all ^ x « matrices with the usual matrix operations of 
addition and scalar multiplication is a vector space. We will denote this vector space by the symbol M^j^. Thus, 
for example, the vector space in Example 4 is denoted as ^22- 



In Example 6 the functions were defined on the entire 
interval ( — 00 , 00 ) . However, the arguments used in 
that example apply as well on all subin-tervals of 
(—'>:., oc> ) , such as a closed interval [a, b] or an open 
interval (a, b). We will denote the vector spaces of 
functions on these intervals by F[a, b] and F(a, b), 
respectively. 



EXAMPLE 6 The Vector Space of Real-Valued Functions M 

Let Fbe the set of real- valued functions that are defined at each x in the interval (—00, cx)).Iff=/(x) and 
g = g(x) are two functions in Fand if k is any scalar, then define the operations of addition and scalar 
multiplication by 

(f + g)W=/«+g« (2) 



mM=kf(x) (3) 

One way to think about these operations is to view the numbers X-^) and g(x) as "components" of f and g at the 
point X, in which case Equations 2 and 3 state that two functions are added by adding corresponding components, 
and a function is multiplied by a scalar by multiplying each component by that scalar — exactly as in and R ^ . 
This idea is illustrated in parts (a) and (b) of Figure 4.1.1. The set Fwith these operations is denoted by the 
symbol ^'(— cxj , oj ) . We can prove that this is a vector space as follows: 



Axioms 1 and 6 These closure axioms require that if we add two functions that are defined at each x in the 
interval ( — >^ , ) , then sums and scalar multiples of those functions are also defined at each x in the interval 
(—00, oc ) . This follows from Formulas 2 and 3. 

Axiom 4 This axiom requires that there exists a function OinF(— (X), co), which when added to any other 
function finF(— cx), co) produces f back again as the result. The function, whose value at every point x in the 
interval ( — 00 , 00 ) is zero, has this property. Geometrically, the graph of the function 0 is the line that 
coincides with the x-axis. 

Axiom 5 This axiom requires that for each function fin F ( — >^ , ) there exists a function — f in 

F( — DO , 00 ), which when added to f produces the function 0. The function defined by — f (a ) = — / (^) has 

this property. The graph of _ f can be obtained by refiecting the graph of f about the x-axis (Figure 4. 1.1c). 

Axioms 2,3,7,8,9,10 The validity of each of these axioms follows from properties of real numbers. For example, 
if f and g are functions in F( — CX3 , oc ) , then Axiom 2 requires that f g = g -|- f . This follows from the 
computation 

(f I g)(x)=fW + gW = g«+f« = (g + f)(x) 

in which the first and last equalities follow from 2, and the middle equality is a property of real numbers. We will 
leave the proofs of the remaining parts as exercises. 




Figure 4.1.1 

It is important to recognize that you cannot impose any two operations on any set V and expect the vector space axioms to hold. 
For example, if Fis the set of ^-tuples withpositive components, and if the standard operations from i?" are used, then Fis not 
closed under scalar multiplication, because if u is a nonzero /2-tuple in F, then ( — l)u has at least one negative component and 
hence is not in V. The following is a less obvious example in which only one of the ten vector space axioms fails to hold. 

EXAMPLE 7 A Set That Is Not a Vector Space < 

Let V = b} and define addition and scalar multiplication operations as follows: If u = (2^1, wi) and v = {y\, V2) 
, then define 

uH- v= («i + VI, U2^V'i} 

and if k is any real number, then define 

fcu = {ku\, 0) 

For example, \iu = (2, 4), v = (—3, 5), and k = l^ then 

u + v=(2 + (-3),4 + 5) = (^l,9) 
iu = 7u=(7-2, 0) = (14, 0) 

The addition operation is the standard one from p}, but the scalar multiplication is not. In the exercises we will 
ask you to show that the first nine vector space axioms are satisfied. However, Axiom 10 fails to hold for certain 
vectors. For example, if u = {u\, U2) is such that uj ^ 0, then 

lu= 1(^1, W2) = (1 • "1, 0) = 0) 5tu 
Thus, Fis not a vector space with the stated operations. 



Our final example will be an unusual vector space that we have included to illustrate how varied vector spaces can be. Since the 
objects in this space will be real numbers, it will be important for you to keep track of which operations are intended as vector 
operations and which ones as ordinary operations on real numbers. 

EXAMPLE 8 An Unusual Vector Space M 

Let Fbe the set of positive real numbers, and define the operations on Vto be 

u-\~v = uv [Vector addition is numencal multiplication. ] 

ku = u^ [ S c alar multiplic ation is numeric al exp onentiation. ] 

Thus, for example, 1 + 1 = 1 and (2)(1) = 1= 1 — strange indeed, but nevertheless the set Fwith these 

operations satisfies the 10 vector space axioms and hence is a vector space. We will confirm Axioms 4, 5, and 7, 
and leave the others as exercises. 

• Axiom 4 — The zero vector in this space is the number 1 (i.e., 0=1) since 

• Axiom 5 — The negative of a vector u is its reciprocal (i.e., —2^ = 1 / 2^) since 

• Axiom 7^k(u + v) = (uv) ^ = uK^ = (ku) -h (kv) 



Some Properties of Vectors 

The following is our first theorem about general vector spaces. As you will see, its proof is very formal with each step being 
justified by a vector space axiom or a known property of real numbers. There will not be many rigidly formal proofs of this type 
in the text, but we have included these to reinforce the idea that the familiar properties of vectors can all be derived from the 
vector space axioms. 

THEOREM 4.1.1 

Let Kbe a vector space, u a vector in V, and k a scalar; then: 

(a) Ou = 0 

(b) kO = 0 

(c) (-^)u= 

(d) If tu = 0, then t = 0 or u = 0- 

We will prove parts {a) and (c) and leave proofs of the remaining parts as exercises. 
Proof (a) We can write 



Ou+Ou= (0-f 0)u [Axiom 81 

= Ou [Property of the number 0] 



By Axiom 5 the vector Ou has a negative, — Ou- Adding this negative to both sides above yields 

[Ou+Ou] -I- (-Ou) =0u4 (-Ou) 

or 

OuH- [Ou+ (-Ou)] =0u+ (-Ou) [Axiom 3] 
Ou4 0=0 [Axiom 51 

Ou = 0 [Axiom 4] 

Proof (c) To prove that ( — l)u= — u, we must show that u-f ( — l)u = 0. The proof is as follows: 

u+ (-l)u = lu+ (-l)u [Axiom 101 

= (l + (^l))u [AxiomSl 

= Ou [ Prop erty of numb ers ] 

= 0 [Part (a) of this theorem] 



A Closing Observation 

This section of the text is very important to the overall plan of linear algebra in that it establishes a common thread between 
such diverse mathematical objects as geometric vectors, vectors inR^, infinite sequences, matrices, and real- valued functions, 
to name a few. As a result, whenever we discover a new theorem about general vector spaces, we will at the same time be 
discovering a theorem about geometric vectors, vectors in i^", sequences, matrices, real-valued functions, and about any new 
kinds of vectors that we might discover. 

To illustrate this idea, consider what the rather innocent-looking result in part (a) of Theorem 4.1.1 says about the vector space 
in Example 8. Keeping in mind that the vectors in that space are positive real numbers, that scalar multiplication means 
numerical exponentiation, and that the zero vector is the number 1, the equation 

Ou = 0 

is a statement of the fact that if is a positive real number, then 



Concept Review 

• Vector space 

• Closure under addition 

• Closure under scalar multiplication 

• Examples of vector spaces 

Skills 

• Determine whether a given set with two operations is a vector space. 

• Show that a set with two operations is not a vector space by demonstrating that at least one of the vector space axioms 
fails. 



Exercise Set 4.1 



1. Let VhQ the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 
onu= («i,2^2) andv= (vi, V2): 

u + v= («i -h VI, 2^2-1- '^'2). Au=(0,jtw2) 

(a) Compute u 4 v and for u = ( — 1, 2), v = (3, 4) and k = 3- 

(b) In words, explain why Vis closed under addition and scalar multiplication. 

(c) Since addition on Fis the standard addition operation on p^, certain vector space axioms hold for F because they are 
known to hold for g^. Which axioms are they? 

(d) Show that Axioms 7, 8, and 9 hold. 

(e) Show that Axiom 10 fails and hence that Fis not a vector space under the given operations. 
Answer: 

(a) u I v=(2, 6),3u=(0,6) 

(c) Axioms 1-5 

2. Let Fbe the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 
ona= («i,«2) andv= (vi, V2): 

u + v= («i H-vi H- I,a2 + V2H- 1), hi=(kui,ku2) 

(a) Compute Q + v and for w = (0, 4), V = (1, — 3),andfc=2- 

(b) Show that (0,0)#0. 

(c) Show that ( - 1, - 1) = 0. 

(d) Show that Axiom 5 holds by producing an ordered pair such that a + (— u) = 0 for q = (aj, U2) • 

(e) Find two vector space axioms that fail to hold. 

In Exercises 3-12, determine whether each set equipped with the given operations is a vector space. For those that are not 
vector spaces identify the vector space axioms that fail. 

3. The set of all real numbers with the standard operations of addition and multiplication. 
Answer: 

The set is a vector space with the given operations. 

4. The set of all pairs of real numbers of the form (x, 0) with the standard operations on p^. 

5. The set of all pairs of real numbers of the form (x, y), where x >0, with the standard operations on /J^. 

Answer: 

Not a vector space. Axioms 5 and 6 fail. 

6. The set of all ^-tuples of real numbers that have the form (z, x x) with the standard operations on /J". 

7. The set of all triples of real numbers with the standard vector addition but with scalar multiplication defined by 

Answer: 

Not a vector space. Axiom 8 fails. 

8. The set of all 2 x 2 invertible matrices with the standard matrix addition and scalar multiplication. 

9. The set of all 2 x 2 matrices of the form 



with the standard matrix addition and scalar multiplication. 
Answer: 

The set is a vector space with the given operations. 

10. The set of all real- valued functions / defined everywhere on the real line and such that / (1) = 0 with the operations used 
Example 6. 

11. The set of all pairs of real numbers of the form (1, x) with the operations 

(!.;') + (I/) = {hy+y') and*(l,^) = (\.ky) 

Answer: 

The set is a vector space with the given operations. 

12. The set of polynomials of the form a^-^^aix with the operations 

(ctQ-^aix) + (io+^l^) = (^afo + ^o) + (^1 

and 

kiaQ-^aix) = (Aao) + ikai)x 

13. Verify Axioms 3, 7, 8, and 9 for the vector space given in Example 4. 

14. Verify Axioms 1, 2, 3, 7, 8, 9, and 10 for the vector space given in Example 6. 

15. With the addition and scalar multiplication operations defined in Example 7, show that y — jf^ satisfies Axioms 1-9. 

16. Verify Axioms 1, 2, 3, 6, 8, 9, and 10 for the vector space given in Example 8. 

17. Show that the set of all points in /{^ lying on a line is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the line passes through the origin. 

18. Show that the set of all points in lying in a plane is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the plane passes through the origin. 

In Exercises 19-21, prove that the given set with the stated operations is a vector space. 

19. The set V= {0) with the operations of addition and scalar multiplication given in Example 1. 

20. The set Z?"^ of all infinite sequences of real numbers with the operations of addition and scalar multiplication given in 
Example 3. 

21. The set il/^^2« w x « matrices with the usual operations of addition and scalar multiplication. 

22. Prove part (d) of Theorem 4.1.1. 

23. The argument that follows proves that if u, v, and w are vectors in a vector space V such that q ^ ^1^= v then q = v 
(the cancellation law for vector addition). As illustrated, justify the steps by filling in the blanks. 

u + w = V + w Hypolhesis 

(uH-w) + (— w) = (v+w) H- (— w) Add— w to bolh sides. 

uf [w I (-w)] =vH- [wH-(-w)] 

u-hO = v-hO 

u = v 

24. Let V be any vector in a vector space V. Prove that Qv = 0- 

25. Below is a seven-step proof of part (b) of Theorem 4.1.1. Justify each step either by stating that it is true by hypothesis or 
specifying which of the ten vector space axioms applies. 

Hypothesis: Let u be any vector in a vector space V, let 0 be the zero vector in V, and let A: be a scalar. 



Conclusion: Then ^ = 0- 



Proof: 



(1) 



AO + An = ^(0 + u 



(2) 



(3) 



Since An is in F, -An is in V. 



(4) 



Therefore, (AO + An + (-An = An + (-An). 



(5) 



AO + (Au + (-An)) = An + (-An) 



(6) 



A0 + 0 = 0 



(7) 



A0 = 0 



26. Let V be any vector in a vector space V. Prove that — v = ( — 1) v. 

27. Prove: If u is a vector in a vector space Fand k a scalar such that tu = 0? then either ^ = Q or q = Q- [Suggestion: Show 
that if = 0 ^iid t ?t 0? then u = 0- The result then follows as a logical consequence of this.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) A vector is a directed line segment (an arrow). 
Answer: 

False 

(b) A vector is an /2-tuple of real numbers. 
Answer: 

False 

(c) A vector is any element of a vector space. 
Answer: 

True 

(d) There is a vector space consisting of exactly two distinct vectors. 
Answer: 

False 

(e) The set of polynomials with degree exactly 1 is a vector space under the operations defined in Exercise 12. 
Answer: 

False 
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4.2 Subspaces 



It is possible for one vector space to be contained within another. We will explore this idea in this section, we 
will discuss how to recognize such vector spaces, and we will give a variety of examples that will be used in 
our later work. 

We will begin with some terminology. 

r n 



DEFINITION 1 



A subset ^of a vector space Fis called a subspace of Fif ^is itself a vector space under the addition 
and scalar multiplication defined on V. 



In general, to show that a nonempty set ^ with two operations is a vector space one must verify the ten vector 
space axioms. However, if ^ is a subspace of a known vector space F, then certain axioms need not be verified 
because they are "inherited" from V. For example, it is not necessary to verify that u + v = v + u holds in W 
because it holds for all vectors in F including those in W. On the other hand, it is necessary to verify that 
closed under addition and scalar multiplication since it is possible that adding two vectors in ^or multiplying a 
vector in ^ by a scalar produces a vector in Fthat is outside of ^(Figure 4.2.1). 




Figure 4.2.1 The vectors u and v are in W, but the vectors u | v and kn are not 



Those axioms that are not inherited by ^are 

Axiom 1 — Closure of under addition 

Axiom 4 — Existence of a zero vector in W 

Axiom 5 — Existence of a negative in W for every vector in W 

Axiom 6 — Closure of Sunder scalar multiplication 

so these must be verified to prove that it is a subspace of F. However, the following theorem shows that if 
Axiom 1 and Axiom 6 hold in W, then Axioms 4 and 5 hold in ^ as a consequence and hence need not be 
verified. 



THEOREM 4.2.1 



If ^ is a set of one or more vectors in a vector space F, then ^is a subspace of Fif and only if the 
following conditions hold. 

(a) If u and v are vectors in W, then u | v is in 

(b) If k is any scalar and u is any vector in W, then kn is in W. 

z □ 

In words, Theorem 4.2.1 states that ^is a 
subspace of Fif and only if it is closed under 
addition and scalar multiplication. 

Proof If is a subspace of F, then all the vector space axioms hold in W, including Axioms 1 and 6, which 
are precisely conditions {a) and {b). 

Conversely, assume that conditions {a) and {b) hold. Since these are Axioms 1 and 6, and since Axioms 2, 3, 7, 
8, 9, and 10 are inherited from F, we only need to show that Axioms 4 and 5 hold in W. For this purpose, let u 
be any vector in W. It follows from condition {b) that An is a vector in W for every scalar k. In particular, 
Ou = 0 and (— l)u = — u are in W, which shows that Axioms 4 and 5 hold in W. 

Note that every vector space has at least two 
subspaces, itself and its zero subspace. 

EXAMPLE 1 The Zero Subspace < 

If Fis any vector space, and iflV= {0} is the subset of Fthat consists of the zero vector only, 
then ^is closed under addition and scalar multiplication since 

0 + 0 = 0 and A0 = 0 
for any scalar k. We call WthQ zero subspace of F. 

EXAMPLE 2 Lines Through the Origin Are Subspaces of and of ^ 

If ^ is a line through the origin of either a!"^ or then adding two vectors on the line Wor multiplying ; 
on the line Why a scalar produces another vector on the line W, so ^is closed under addition and scalar 
multiplication (see Figure 4.2.2 for an illustration in /J^). 




(a) W is closed imder addition. (6) W is closed under scalar 

mulliplication. 



Figure 4.2.2 



EXAMPLE 3 Planes Through the Origin AreSubspaces of ^ 

If u and V are vectors in a plane ^through the origin of then it is evident geometrically that u | v 
and ku he in the same plane ^for any scalar k (Figure 4.2.3). Thus Wis closed under addition and 
scalar multiplication. 



U -h V 
V ^^--^ / 



W 

Figure 4.2.3 The vectors u + v and ku both lie in the same plane as u and v 

Table 1 that follows gives a list of subspaces of and of that we have encountered thus far. We will see 
later that these are the only subspaces of and of R^. 

Table 1 



Subspaces of Subspaces of 

• {0} • {0} 

• Lines through the origin • Lines through the origin 

• • Planes through the origin 



EXAMPLE 4 A Subset of That Is Not a Subspace < 



Let Whe the set of all points (x, y) in p^^ for which > 0 and y > 0 (the shaded region in Figure 
4.2 A). This set is not a subspace of because it is not closed under scalar multiplication. For 
example, v = (1, 1) is a vector in W, but (— l)v = ( — 1, — 1) is not. 



>' 
w 



/ 



(1.1) 



/ 



Figure 4.2.4 ^is not closed under scalar multiplication 



EXAMPLE 5 Subspaces of Mnn < 

We know from Theorem 1.7.2 that the sum of two symmetric n x n matrices is symmetric and 
that a scalar multiple of a symmetric n>^n matrix is symmetric. Thus, the set of symmetric nx^n 
matrices is closed under addition and scalar multiplication and hence is a subspace of M^^. 
Similarly, the sets of upper triangular matrices, lower triangular matrices, and diagonal matrices 
are subspaces of My^y^. 



EXAMPLE 6 ASubset of /Wnn That Is Not a Subspace < 



The set ^of invertible n ^n matrices is not a subspace of failing on two counts — it is not 
closed under addition and not closed under scalar multiplication. We will illustrate this with an 
example in Mji that you can readily adapt to il/^^. Consider the matrices 



and r = 



-1 
-2 



The matrix 0^7 is the 2 x 2 zero matrix and hence is not invertible, and the matrix JJ \ V has a 
column of zeros, so it also is not invertible. 



CALCULUS REQUIRED 

EXAMPLE 7 The Subspace C(-oo, oo) ^ 

There is a theorem in calculus which states that a sum of continuous functions is continuous and 
that a constant times a continuous function is continuous. Rephrased in vector language, the set 
of continuous functions on ( — oo , oo ) is a subspace of F( — oo , oo ). We will denote this 



subspace by C(— cx) , oo). 



CALCULUS REQUIRED 

EXAMPLE 8 Functions with Continuous Derivatives M 

A function with a continuous derivative is said to be continuously differentiable. There is a 
theorem in calculus which states that the sum of two continuously differentiable functions is 
continuously differentiable and that a constant times a continuously differentiable function is 
continuously differentiable. Thus, the functions that are continuously differentiable on 
( — oo , oo ) form a subspace of F ( — oo , cx:- ) . We will denote this subspace by 

( — CX) , CX) ), where the superscript emphasizes that the first derivative is continuous. To take 

this a step further, the set of functions with m continuous derivatives on ( — cx) , cx) ) is a 
subspace of F ( — cx) , cx) ) as is the set of functions with derivatives of all orders on 
(— CX) , cxj ). We will denote these subspaces by C^(— cx) , cx) ) and C"^(— cx) , cx) ), 

respectively. 



EXAMPLE 9 The Subspace of All Polynomials < 

Recall that a polynomial is a function that can be expressed in the form 

p{x)=a{^'\-a\x^ ' ' ' +ay^x^ (1) 

where a[\,a\, • • ■ , i3f „ are constants. It is evident that the sum of two polynomials is a 
polynomial and that a constant times a polynomial is a polynomial. Thus, the set ^of all 
polynomials is closed under addition and scalar multiplication and hence is a subspace of 
F( — CX) , CX) ) . We will denote this space by P^. 



EXAMPLE 10 The Subspace of Polynomials of Degree < A7 M 

Recall that the degree of a polynomial is the highest power of the variable that occurs with a 
nonzero coefficient. Thus, for example, if ciy^ ;i 0 in Formula 1, then that polynomial has degree n. 
It is not true that the set ^of polynomials with positive degree n is a subspace of F( — cx) , cx) ) 
because that set is not closed under addition. For example, the polynomials 

1 + 2^: + 3^:^ and 5 + 7:^ - 

both have degree 2, but their sum has degree 1. What is true, however, is that for each nonnegative 
integer n the polynomials of degree n or less form a subspace of F( — cx) , cx) ). We will denote 
this space by f 



In this text we regard all constants to be 
polynomials of degree zero. Be aware, however, 
that some authors do not assign a degree to the 
constant 0. 



The Hierarchy of Function Spaces 

It is proved in calculus that polynomials are continuous functions and have continuous derivatives of all orders 
on ( — oc , ) . Thus, it follows that P ^ is not only a subspace of F ( — ^ , ) , as previously observed, but 
is also a subspace of C ^ ( — oo , ) . We leave it for you to convince yourself that the vector spaces 
discussed in Example 7 to Example 10 are "nested" one inside the other as illustrated in Figure 4.2.5. 




Figure 4.2.5 



Remark In our previous examples, and as illustrated in Figure 4.2.5, we have only considered functions that 
are defined at all points of the interval ( — x) , co). Sometimes we will want to consider functions that are 
only defined on some subinterval of (— >: , x ), say the closed interval [a, b] or the open interval (a, b). In 
such cases we will make an appropriate notation change. For example, C[a, 6] is the space of continuous 
functions on [a, b} and C{a, b) is the space of continuous functions on {a, b). 

Building Subspaces 

The following theorem provides a useful way of creating a new subspace from known subspaces. 

a. 

THEOREM 4,2.2 

If f^i, W2, - are subspaces of a vector space F, then the intersection of these subspaces is also a 
subspace of V. 



Note that the first step in proving Theorem 4.2.2 
was to establish that ^contained at least one 
vector. This is important, for otherwise the 
subsequent argument might be logically correct 
but meaningless. 



Proof Let ^be the intersection of the subspaces IVi , l¥2, . . W^. This set is not empty because each of these 
subspaces contains the zero vector of F, and hence so does their intersection. Thus, it remains to show that ^is 
closed under addition and scalar multiplication. 

To prove closure under addition, let u and v be vectors in W. Since ^is the intersection of W\ ,W2, ...,Wy/\t 
follows that u and v also lie in each of these subspaces. Since these subspaces are all closed under addition, 
they all contain the vector u + v and hence so does their intersection W. This proves that ^is closed under 
addition. We leave the proof that H^is closed under scalar multiplication to you. 

Sometimes we will want to find the "smallest" subspace of a vector space Fthat contains all of the vectors in 
some set of interest. The following definition, which generalizes Definition 4 of Section 3.1, will help us to do 
that. 

If ,t = 1 ? then Equation 2 has the form 

w = k\Y\/irY which case the linear combination 

is just a scalar multiple of . 

r n 



DEFINITION 2 

If w is a vector in a vector space F, then w is said to be a linear combination of the vectors 
VI, V2, in Fif w can be expressed in the form 

w=jtivi +jt2V2+ • • • +jtyVy (2) 

where k\, k2, are scalars. These scalars are called the coefficients of the linear combination. 

L J 



THEOREM 4.2.3 

If S' = {wi , W2, - - ) is a nonempty set of vectors in a vector space F, then: 

(a) The set ^of all possible linear combinations of the vectors in 5* is a subspace of F. 

(b) The set ^in part (a) is the "smallest" subspace of Fthat contains all of the vectors in S in the sense 
that any other subspace that contains those vectors contains W. 



y 



Proof (a) Let Whc the set of all possible linear combinations of the vectors in S. We must show that S is 
closed under addition and scalar multiplication. To prove closure under addition, let 

u = <:iwi +i:2W2 + • ■ • + c^w^ and v = Arjwi + jt2W2 + ' ' ' +jtj.Wy 

be two vectors in S. It follows that their sum can be written as 

Q + V= (ci +jti)wi + (C2+Jt2)w2+ • • • +(Cr + kr)Wr 

which is a linear combination of the vectors in S. Thus, W is closed under addition. We leave it for you to prove 
that ^is also closed under scalar multiplication and hence is a subspace of V. 

Proof (b) Let W be any subspace of Fthat contains all of the vectors in S. Since W is closed under addition 
and scalar multiplication, it contains all linear combinations of the vectors in S and hence contains W. 



The following definition gives some important notation and terminology related to Theorem 4.2.3. 

r 

DEFINITION 3 

The subspace of a vector space Fthat is formed from all possible linear combinations of the vectors in 
a nonempty set S is called the span of S, and we say that the vectors in S span that subspace. If 
S = (wi , W2, . . W;. } , then we denote the span of S by 

span{wi, W2, Wy} or span (50 



EXAMPLE 11 The Standard Unit Vectors Span < 

Recall that the standard unit vectors in are 

ei = (1.0.0 0). e2 = (0, 1,0....,0)...., e„ = (0. 0. 0, ...1) 

These vectors span since every vector v = (vj, V2, v„) in can be expressed as 

v = viei+V2e2+ • • • +v„e„ 
which is a linear combination of ej, 62, Thus, for example, the vectors 

i= (1,0.0). j= (0.1,0). k= (0.0.1) 
span R^ since every vector v = (a, i, c) in this space can be expressed as 

v= (a, b, c) =a{\, 0, 0) +i(0, 1, 0) +c(0, 0, 1) =^i + ij + ck . 



EXAMPLE 12 AGeometric View of Spanning in and ^ 

(a) If V is a nonzero vector mp^ ox that has its initial point at the origin, then span{v}, which 
is the set of all scalar multiples of v, is the line through the origin determined by v. You should 
be able to visualize this from Figure 4.2.6a by observing that the tip of the vector k\ can be 
made to fall at any point on the line by choosing the value of k appropriately. 




George William Hill (1838-1914) 



Historical Note The terms linearly independent and linearly dependent were 
introduced by Maxime Bocher (see p. 7) in his hook Introduction to Higher Algebra, 
pubHshed in 1907. The term linear combination is due to the American mathematician 
G. W. Hill, who introduced it in a research paper on planetary motion published in 
1900. Hill was a "loner" who preferred to work out of his home in West Nyack, New 
York, rather than in academia, though he did try lecturing at Columbia University for a 
few years. Interestingly, he apparently returned the teaching salary, indicating that he 
did not need the money and did not want to be bothered looking after it. Although 
technically a mathematician. Hill had little interest in modern developments of 
mathematics and worked almost entirely on the theory of planetary orbits. 
[Image: Courtesy of the American Mathematical Society] 

■ ■ 

(b) If VI and V2 are nonzero vectors in p} that have their initial points at the origin, then 

span {vj, V2} , which consists of all linear combinations of and V2, is the plane through the 
origin determined by these two vectors. You should be able to visualize this from Figure 4.2.66 
by observing that the tip of the vector ki\\ I ^2^2 made to fall at any point in the 

plane by adjusting the scalars k\ and k2 to lengthen, shorten, or reverse the directions of the 
vectors ^^vi andfc2V2 appropriately. 




s()aii(v,, Vj) 




/ 



X 



(a) Span{v) is the line tlirough the 
origin determined by v. 



{b) Spanfvj, ¥>} is the plane through the 
origin detemiined by \\ and 



Figure 4.2.6 



EXAMPLE 13 A Spanning Set for Pn M 

The polynomials 1, x, x^, ...,x^ span the vector space P„ defined in Example 10 since each 
polynomial p in P^^ can be written as 

^=a{^-^a\x + • • • '¥ay{x^ 
which is a linear combination oi\^x,x^, ■ • - ,x^. We can denote this by writing 

P„ = span! 1,7:,;^^, • • ' 



The next two examples are concerned with two important types of problems: 

« Given a set S of vectors in and a vector v in determine whether v is a linear combination of the 
vectors in S. 

• Given a set S of vectors in determine whether the vectors span R^. 

EXAMPLE 14 Linear Combinations M 

Consider the vectors a=(l,2, — 1) and v = (6, 4, 2) in R^. Show that w= (9, 2, 7) is a 
linear combination of u and v and that w' = (4, — 1, 8) is not a linear combination of u and v. 

Solution In order for w to be a linear combination of u and v, there must be scalars ki and k2 
such that w = k\u -I- ^2^; that is, 

(9, 2,7)=ki(\,2,-\)+k2(.6,4,2) 

or 

(9, 2, 7) = (;ti + 6k2. 2ki + 4k2, - yti + 2k2) 

Equating corresponding components gives 

ki+6k2 = 9 
2ki+4k2 = 2 
^ki+2k2 = 7 
Solving this system using Gaussian elimination yields ki= — 3, ^2 = 2, so 

w= — 3u + 2v 

Similarly, for w' to be a linear combination of u and v, there must be scalars ki and ^2 such that 
v/ = kiu + k2Vl that is, 

(4, -1,8)= ^1(1, 2, -/)+i2(6,4,2) 

or 

(4, - 1, 8) = (ki + 6k2, 2ki +4k2, -ki + 2k2) 



Equating corresponding components gives 

k\ +6k2 = 4 

2ki+4k2 = -1 

-/ti +2k2 = 8 

This system of equations is inconsistent (verify), so no such scalars ki and k2 exist. 
Consequently, w' is not a Hnear combination of u and v. 



EXAMPLE 15 Testing for Spanning M 

Determine whether vi = (1, 1, 2), V2 =(1,0, 1), and V3 = (2, 1, 3) span the vector space R^. 

Solution We must determine whether an arbitrary vector h = {b\, b2, 63) in p} can be 

expressed as a Hnear combination 

b = kiY\ + ^2^2 + ^3V3 

of the vectors vj, V2, and V3. Expressing this equation in terms of components gives 
(il, b2. bi) = ^1 (1, 1, 2) I ^2(1, 0, 1) + ^3(2, 1, 3) 

or 

(bu b2, 63) = (*1 + ^2 + 2*3, *1 + *3, 2*1 + *2 + 3*3) 

or 

*1 +*2 + 2*3 = ^1 
*1 + *3 = ^2 
2*1 + *2 + 3*3 = Z>3 

Thus, our problem reduces to ascertaining whether this system is consistent for all values of i 1 , 
^2? and b^' One way of doing this is to use parts {e) and {g) of Theorem 2.3.8, which state that 
the system is consistent if and only if its coefficient matrix 

"1 1 2" 
^=101 

2 1 3 

has a nonzero determinant. But this is not the case here; we leave it for you to confirm that 
det(^) = 0, so vi, V2, and V3 do not span p}. 



Solution Spaces of Homogeneous Systems 

The solutions of a homogeneous linear system = 0 of m equations in n unknowns can be viewed as vectors 
in R^. The following theorem provides a useful insight into the geometric structure of the solution set. 



THEOREM 4.2.4 



The solution set of a homogeneous linear system Ax = 0 in n unknowns is a sub space of 

0. □ 

Proof Let ^be the solution set for the system. The set ^is not empty because it contains at least the trivial 
solution x = 0- 

To show that ^is a subspace of we must show that it is closed under addition and scalar multiplication. To 
do this, let and X2 be vectors in W. Since these vectors are solutions of ^ = 0? we have 

Ax.\ = 0 and ^2 = ^ 

It follows from these equations and the distributive property of matrix multiplication that 

i4(xi +X2) =^1+^2 = 0 + 0=0 
so J^^is closed under addition. Similarly, if k is any scalar then 

A(kxi)=kAs:i =ki) = 0 
so W is also closed under scalar multiplication. 

Because the solution set of a homogeneous 
system in n unknowns is actually a subspace of 
R^,we will generally refer to it as the solution 
space of the system. 

EXAMPLE 1 6 Solution Spaces of Homogeneous Systems M 



Consider the linear systems 



(a) 


"1 -2 3 




k' 






0" 






2-4 6 


y 






0 






3-6 9 




z 






0 




(b) 


1 -2 




3" 




x' 




'0" 




-3 7 


-8 




y 




0 




-2 4 


— ( 


5 




z 




0 


(c) 


1 -2 




3" 




x' 




"0" 




-3 7 


-8 




y 




0 




4 1 




2_ 




z 




0 


(d) 


"0 0 01- 


x' 






'0 








0 0 0 


y 






0 








0 0 0. 


z 






0 







Solution 

(a) We leave it for you to verify that the solutions are 

x = 2s — 3t, y = s, z = t 

from which it follows that 



X = 2y ^3z or X ^2y + 3z = 0 

This is the equation of a plane through the origin that has n = (1, — 2, 3) as a normal. 

(b) We leave it for you to verify that the solutions are 

x= - 5^, y= - ^, z = t 

which are parametric equations for the line through the origin that is parallel to the vector 
v=(-5, -1.1). 

(c) We leave it for you to verify that the only solution is ^ = 0, = 0, z = 0, so the solution 
space is {0}. 

(d) This linear system is satisfied by all real values of x, y, and z, so the solution space is all of 



Remark Whereas the solution set of every homogeneous system of m equations in n unknowns is a subspace 
of it is never true that the solution set of a nonhomogeneous system of m equations in n unknowns is a 
subspace of R^. There are two possible scenarios: first, the system may not have any solutions at all, and 
second, if there are solutions, then the solution set will not be closed under either addition or under scalar 
multiplication (Exercise 18). 



A Concluding Observation 

It is important to recognize that spanning sets are not unique. For example, any nonzero vector on the line in 
Figure 4.2.6a will span that line, and any two noncoUinear vectors in the plane in Figure 4.2.6b will span that 
plane. The following theorem, whose proof we leave as an exercise, states conditions under which two sets of 
vectors will span the same space. 



THEOREM 4.2.5 

lfS= {v\ , V2, . - V J. ) and = {w\, vi?2, . - y^k ) nonempty sets of vectors in a vector space F, 
then 

span {vi, V2, Vy) =span {wi, W2, w^} 

if and only if each vector in is a linear combination of those in S\ and each vector in S' is a linear 
combination of those in S. 



Concept Review 

* Subspace 



• Zero subspace 

• Examples of subspaces 

• Linear combination 

• Span 

• Solution space 

Skills 

• Determine whether a subset of a vector space is a subspace. 

• Show that a subset of a vector space is a subspace. 

• Show that a nonempty subset of a vector space is not a subspace by demonstrating that the set is 
either not closed under addition or not closed under scalar multiplication. 

• Given a set S of vectors inR^ and a vector v in determine whether v is a linear combination of 
the vectors in S. 

• Given a set S of vectors in determine whether the vectors in S span R^. 

• Determine whether two nonempty sets of vectors in a vector space V span the same subspace of V. 



Exercise Set 4.2 

1. Use Theorem 4.2.1 to determine which of the following are subspaces of R^. 

(a) All vectors of the form (a, 0, 0). 

(b) All vectors of the form (a, 1,1). 

(c) All vectors of the form (a, b, c), where b=a+C' 

(d) All vectors of the form {a, b, c), where h =a + c + \' 

(e) All vectors of the form {a, b, 0). 

Answer: 

(a), (c), (e) 

2. Use Theorem 4.2.1 to determine which of the following are subspaces of 

(a) The set of all diagonal n xn matrices. 

(b) The set of all ^ x « matrices A such that det(^) = 0. 

(c) The set of all ^ x « matrices A such that tr(A) = 0. 

(d) The set of all symmetric nxn matrices. 

(e) The set of all ^ x « matrices A such that = ^ A- 

(f) The set of all ^ x « matrices A for which ^ = 0 has only the trivial solution. 

(g) The set of all « x « matrices A such that AB = BA for some fixed n x n matrix B. 

3. Use Theorem 4.2.1 to determine which of the following are subspaces of P3. 
(a) All polynomials + a\x + a2X^ + a2X^ fo^ which c^fg = 0. 



(b) All polynomials aQ+aix+ + a-^ foi* which + fli + <32 + «3 = 0- 

(c) All polynomials of the form ^^j^^^j^j^ ^ which <^^,<i\, ^2, and 33 are integers. 

(d) All polynomials of the form flg + ^1^? where ciQ and fli are real numbers. 

Answer: 

(a),(b), (d) 

4. Which of the following are subspaces ofF(— oo, oo)? 

(a) All functions /in f ( — oo , oo ) for which / (0) = 0. 

(b) All functions /in F( — oo , oo ) for which / (0) = 1 . 

(c) All functions /in F( — oo , oo ) for which / (—7:) =f{x). 

(d) All polynomials of degree 2. 

5. Which of the following are subspaces of R ^ ? 

(a) All sequences v in i? of the form v = (v, 0, v, 0, v, 0, ...). 

(b) All sequences v in of the form v = (v, 1 , v, 1 , v, 1 , . . .) • 

(c) All sequences v in ^ of the form v = (v, 2v, 4v, 8v, 16v\ . . .) . 

(d) All sequences in whose components are 0 from some point on. 

Answer: 

(a), (c), (d) 

6. A line L through the origin in can be represented by parametric equations of the form x = at, y = bt, 
and z = cl- Use these equations to show that Z is a subspace of p/' by showing that if = {^\,y\,z\) and 
V2 = {x2, y2> ^2^ ^"^^ points on L and k is any real number, then ^'i and \'\ I V2 are also points on L. 

7. Which of the following are linear combinations of a = (0, — 2, 2) and v=(l, 3, — 1)? 

(a) (2,2,2) 

(b) (3,1,5) 

(c) (0,4,5) 

(d) (0, 0, 0) 

Answer: 

(a),(b), (d) 

8. Express the following as linear combinations of u = (2, 1, 4), v = (1, —1,3), and w= (3, 2, 5). 

(a) (-9. -7- -15) 

(b) (6,11,6) 

(c) (0,0,0) 

(d) (7,8,9) 

9. Which of the following are linear combinations of 



(a) r 6 -8] 
.-1 -8 J 



(b) 
(c) 
(d) 



0 0 
0 0 



1 



Answer: 

(a),(b), (c) 

10. In each part express the vector as a linear combination of pj — 2 +x +4x^» P2 = 1 — ;r + 3x^' ^^'^ 
P3 = 3 + 2x + 

(a) -9-7x-\5x^ 

(b) 6 + ll;c + 6x2 

(c) 0 

(d) 7 + 8x + 9x^ 

11. In each part, determine whether the given vectors span /j3 

(a) VI = (2, 2, 2), V2 = (0. 0. 3), V3 = (0. 1, 1) 

(b) VI = (2, - 1, 3), V2 = (4, 1, 2), V3 = (8, - 1, 8) 

(c) VI = (3, 1. 4), V2 = (2. - 3, 5), V3 = (5, - 2, 9), V4= (1. 4, - 1) 

(d) VI = (1, 2, 6), V2 = (3. 4, 1), V3 = (4. 3, 1), V4= (3, 3. 1) 

Answer: 

(a) The vectors span 

(b) The vectors do not span 

(c) The vectors do not span 

(d) The vectors span 

12. Suppose that vi = (2, 1, 0, 3), V2 = (3, - 1, 5, 2), and V3 = ( - 1, 0, 2, 1). Which of the following 
vectors are in span {vi, V2, V3} ? 

(a) (2,3. -7.3) 

(b) (0, 0, 0, 0) 

(c) (1,1, 1, 1) 

(d) (-4.6.-13,4) 

13. Determine whether the following polynomials span P2. 



Pl = \-x + 2x\ P2 = 3 + x. 

P3 = 5-x+4x^, p4=-2-2x + 2x^ 



Answer: 



The polynomials do not span 

14. Let f = cos'^x and g — sm x. Which of the following lie in the space spanned by f and g? 

(a) cos 2x 

(b) 3 \-x^ 

(c) 1 

(d) suij: 

(e) 0 

15. Determine whether the solution space of the system — line through the origin, a plane through the 
origin, or the origin only. If it is a plane, find an equation for it. If it is a line, find parametric equations for 
it. 

(a) r-1 1 1 



A = 



(b) 



(c) 



A = 



id) 



A = 



(e) 



(f) 



A = 



A = 



-1 1 
3 -1 
2 -4 

1 -2 

-3 6 
-2 4 

1 2 3 

2 5 3 
1 0 8 

1 2 -6 
1 4 4 

3 10 6 

1 -1 1 
2-14 
3 1 11 

1 -3 1 

2- 6 2 

3- 9 3 



0 
-5 

3 
9 
-6 



Answer: 



(a) Line; x = - ^t. y = 

(b) Line; x = 2t, y^t, z 

(c) Origin 

(d) Origin 



= 0 



(e) Line; J = - 3^, 7 = -2^, z = t 

(f) Plane; 37 +z=0 

16. {Calculus required) Show that the following sets of functions are subspaces of F( — cx), cx)). 

(a) All continuous functions on ( — cx5, cx)) . 

(b) All differentiable functions on ( — cx), cx)) . 

(c) All differentiable functions on ( — cx), CX3) that satisfy f ' -H 2f = 0. 

17. {Calculus required) Show that the set of continuous functions { = f (x) on [a, b] such that 



is a subspace of C[a, b]. 

18. Show that the solution vectors of a consistent nonhomoge- neous system of m linear equations in n 
unknowns do not form a subspace of/?". 

19. Prove Theorem 4.2.5. 

20. Use Theorem 4.2.5 to show that the vectors vj = (1, 6, 4), V2 = (2, 4, — 1), V3 = ( — 1, 2, 5), and the 
vectors wi = (1, — 2, — 5), W2 = (0, 8, 9) span the same subspace of 

True-False Exercises 

In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 

(a) Every subspace of a vector space is itself a vector space. 
Answer: 

True 

(b) Every vector space is a subspace of itself 
Answer: 



(c) Every subset of a vector space Fthat contains the zero vector in F is a subspace of V. 
Answer: 

False 

(d) The set is a subspace of 
Answer: 

False 

(e) The solution set of a consistent linear system Ax = b of m equations in n unknowns is a subspace of/?". 
Answer: 




True 



False 

(f) The span of any finite set of vectors in a vector space is closed under addition and scalar multiplication. 
Answer: 

True 

(g) The intersection of any two subspaces of a vector space Visa subspace of V. 
Answer: 

True 

(h) The union of any two subspaces of a vector space Fis a subspace of V. 
Answer: 

False 

(i) Two subsets of a vector space Fthat span the same subspace of Fmust be equal. 
Answer: 

False 

(j) The set of upper triangular « x « rnatrices is a subspace of the vector space of all « x « rnatrices. 
Answer: 
True 

(k) 

The polynomials ;c — 1, Span P3. 

Answer: 

False 
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4.3 Linear Independence 



In this section we will consider the question of whether the vectors in a given set are interrelated in the sense 
that one or more of them can be expressed as a linear combination of the others. This is important to know in 
applications because the existence of such relationships often signals that some kind of complication is likely 
to occur. 



Extraneous Vectors 

In a rectangular xy-coordinate system every vector in the plane can be expressed in exactly one way as a 
linear combination of the standard unit vectors. For example, the only way to express the vector (3, 2) as a 
linear combination of i = (1, 0) andj= (0, 1) is 

(3,2) = 3(l,0) + 2(0.1) = 3i + 2j (1) 

(Figure 4.3.1). Suppose, however, that we were to introduce a third coordinate axis that makes an angle of 45° 
with the X-axis. Call it the w-axis. As illustrated in Figure 4.3.2, the unit vector along the w-axis is 



w= 



Whereas Formula 1 shows the only way to express the vector (3, 2) as a linear combination of i and j, there 
are infinitely many ways to express this vector as a linear combination of i, j, and w. Three possibilities are 



[3,2 


!-' 




1 + 2(0, ij + o| 




^) 


= 3i + 2i + 0w 




■i 


"1 


+ |o, 1^ + /2| 






= 3i + j+ /2w 



(3.2j=4(,,0j + 3(0,,J-/5(J=,-Lj = 4i + 3j-/iw 

In short, by introducing a superfiuous axis we created the complication of having multiple ways of assigning 
coordinates to points in the plane. What makes the vector w superfiuous is the fact that it can be expressed as 
a linear combination of the vectors i and j, namely. 

Thus, one of our main tasks in this section will be to develop ways of ascertaining whether one vector in a set 
5* is a linear combination of other vectors in S. 



Figure 4.3.1 




Figure 4.3.2 



Linear Independence and Dependence 



We will often apply the terms linearly 
independent and linearly dependent to the 
vectors themselves rather than to the set. 



r 



DEFINITION 1 

If )?= {vi, V2, V;.} is a nonempty set of vectors in a vector space V, then the vector equation 

k\\\ + jt2V2 + ... + kyYy = 0 

has at least one solution, namely, 

fcl = 0, ^2 = 0,-, /t, = 0 

We call this the trivial solution. If this is the only solution, then S is said to be a linearly independent 
set. If there are solutions in addition to the trivial solution, then S is said to be a linearly dependent 
set. 



EXAMPLE 1 Linear Independence of the Standard Unit Vectors in ^ 

The most basic linearly independent set in is the set of standard unit vectors 

ei = (1, 0, 0, 0), 62 = (0, 1, 0, 0), e„ = (0, 0, 0, 1) 



For notational simplicity, we will prove the linear independence in /^-^ of 

i= (1.0,0), j= (0,1,0), k= (0,0.1) 

The linear independence or linear dependence of these vectors is determined by whether there exist non 
solutions of the vector equation 

kii + jt2] + k^li = 0 

Since the component form of this equation is 

(kuk2.k3) = (0, 0, 0) 

it follows that fcj = ^2 = ^3 = 0- This implies that 2 has only the trivial solution and hence that the vec 
linearly independent. 



EXAMPLE 2 Linear Independence in R ^ 

Determine whether the vectors 

VI = (1,-2,3), V2 = (5,6, -1), V3=(3,2,l) 

are linearly independent or linearly dependent in 



Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 

kivi + jt2V2 + jt3V3 = 0 



(3) 



or, equivalently, of 

^l(l, -2,3)+*2(5,6, -1) +^3(3, 2,1) = (0,0,0) 
Equating corresponding components on the two sides yields the homogeneous linear system 

^1 -I- 5^:2 + 3A:3 = 0 
-2^1 + 6k2 + 2/t3 = 0 (4) 
3^1 -k2 + k:2 = 0 

Thus, our problem reduces to determining whether this system has nontrivial solutions. There 
are various ways to do this; one possibility is to simply solve the system, which yields 



ki= - h. 



k2=-h, k3 = t 



(we omit the details). This shows that the system has nontrivial solutions and hence that the 
vectors are linearly dependent. A second method for obtaining the same result is to compute the 
determinant of the coefficient matrix 

"1 5 3" 
A= -2 6 2 

3 -1 1 



and use parts (b) and (g) of Theorem 2.3.8. We leave it for you to verify that det(^) = 0, from 
which it follows 3 has nontrivial solutions and the vectors are linearly dependent. 



In Example 2, what relationship do you see 
between the components of v^, V2, and V3 and 
the columns of the coefficient matrix 

EXAMPLE 3 Linear Independence in 

Determine whether the vectors 

VI = (1,2, 2, -1), V2 = (4.9.9, -4), V3=(5,8.9. -5) 

inl^^ are linearly dependent or linearly independent. 

Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 

kivi + k2'V2 + ^3^3 = 0 

or, equivalently, of 

fcl(l,2,2, -1) +^2(4, 9, 9, -4) +^3(5, 8, 9, - 5) = (0, 0, 0, 0) 

Equating corresponding components on the two sides yields the homogeneous linear system 

ki + 4k2 + 5^3 =0 
2ki + 9k2 + Sk3 =0 
2ki + 9k2 + 9k2 =0 
-/ti - Ak2 - 5^3 =0 
We leave it for you to show that this system has only the trivial solution 

ki =0, ^2 = 0. ^3 = 0 
from which you can conclude that , V2, and V3 are linearly independent. 

EXAMPLE 4 An Important Linearly Independent Set in Pn M 

Show that the polynomials 

1, X, X , — , X 

form a linearly independent set in F„. 

Solution For convenience, let us denote the polynomials as 

We must show that the vector equation 

«0P0+^1P1 +^2P2+ ■ " " +^mPm = 




has only the trivial solution 



But 5 is equivalent to the statement that 



(6) 



for all X in ( — oo, oo), so we must show that this holds if and only if each coefficient in 6 is zero. 
To see that this is so, recall from algebra that a nonzero polynomial of degree n has at most n 
distinct roots. That being the case, each coefficient in 6 must be zero, for otherwise the left side of 
the equation would be a nonzero polynomial with infinitely many roots. Thus, 5 has only the 
trivial solution. 



The following example shows that the problem of determining whether a given set of vectors in is linearly 
independent or linearly dependent can be reduced to determining whether a certain set of vectors in /J" is 
linearly dependent or independent. 

EXAMPLE 5 Linear Independence of Polynomials M 

Determine whether the polynomials 



Pl = l— t:, P2 = 5 + 3?: — 2^:^, p3 = l + 37:— 



are linearly dependent or linearly independent in 



Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 



(7) 



This equation can be written as 




5 + 3x- 2x 



2 




= 0 



(8) 



or, equivalently, as 




Since this equation must be satisfied by all x in ( — cx), cx)) , each coefficient must be zero (as 
explained in the previous example). Thus, the linear dependence or independence of the given 
polynomials hinges on whether the following linear system has a nontrivial solution: 



ki + 5A:2 -4- ^3 = 0 
-iti + 3^2 + 3^3 = 0 (9) 
-2^2 - = 0 



We leave it for you to show that this linear system has a nontrivial solutions either by solving it 
directly or by showing that the coefficient matrix has determinant zero. Thus, the set 
{pi,P2, P3} is linearly dependent. 



In Example 5, what relationship do you see 
between the coefficients of the given 
polynomials and the column vectors of the 
coefficient matrix of system 9? 



An Alternative Interpretation of Linear Independence 

The terms linearly dependent and linearly independent are intended to indicate whether the vectors in a gi\ 
set are interrelated in some way. The following theorem, whose proof is deferred to the end of this section, 
makes this idea more precise. 

THEOREM 4.3.1 

A set S with two or more vectors is 

(a) Linearly dependent if and only if at least one of the vectors in S is expressible as a linear 
combination of the other vectors in S. 

(b) Linearly independent if and only if no vector in S is expressible as a linear combination of the 
other vectors in S. 

n 

EXAMPLE 6 Example 1 Revisited A 

In Example 1 we showed that the standard unit vectors in fv" are linearly independent. Thus, it 
follows from Theorem 4.3.1 that none of these vectors is expressible as a linear combination of 
the other two. To illustrate this in suppose, for example, that 

k = jtii + jt2] 

or in terms of components that 

(0, 0, \) = {kx,k2. 0) 

Since this equation cannot be satisfied by any values of k\ and there is no way to express k 
as a linear combination of i and j. Similarly, i is not expressible as a linear combination of j and 
k, and j is not expressible as a linear combination of i and k. 

EXAMPLE 7 Example 2 Revisited < 

In Example 2 we saw that the vectors 

VI = (1. - 2, 3), V2 = (5, 6, - 1), V3 = (3, 2, 1) 

are linearly dependent. Thus, it follows from Theorem 4.3.1 that at least one of these vectors is 



expressible as a linear combination of the other two. We leave it for you to confirm that these 
vectors satisfy the equation 

|vi+^V2-V3=0 

from which it follows, for example, that 

V3 = ^vi + ^V2 



Sets with One or Two Vectors 



The following basic theorem is concerned with the linear independence and linear dependence of sets with 
one or two vectors and sets that contain the zero vector. 



THEOREM 4.3.2 



(a) A finite set that contains 0 is linearly dependent. 

(b) A set with exactly one vector is linearly independent if and only if that vector is not 0. 

(c) A set with exactly two vectors is linearly independent if and only if neither vector is a scalar 
multiple of the other. 




Jozef Hoene de Wronski (1778-1853) 



Historical Note The Polish-French mathematician Jozef Hoene de Wronski was bom Jozef Hoene 
and adopted the name Wronski after he married. Wrohski's life was fraught with controversy and 
conflict, which some say was due to his psychopathic tendencies and his exaggeration of the 
importance of his own work. Although Wrohski's work was dismissed as rubbish for many years, and 
much of it was indeed erroneous, some of his ideas contained hidden brilliance and have survived. 
Among other things, Wronski designed a caterpillar vehicle to compete with trains (though it was 



never manufactured) and did research on the famous problem of determining the longitude of a ship at 
sea. His final years were spent in poverty. 
[Image: wikipedia] 



We will prove part (a) and leave the rest as exercises. 

Proof (a) For any vectors vi, V2, v^., the set S = {vi, V2, v^., 0) is linearly dependent since the 
equation 

Ovi + 0v2 + • • • + Ovy + 1 (0) = 0 

expresses 0 as a linear combination of the vectors in S with coefficients that are not all zero. 

EXAMPLE 8 Linear Independence of Two Functions M 

The functions f i = 7: and f 2 = sin 7: are linearly independent vectors in — 00, oc-) since 
neither function is a scalar multiple of the other. On the other hand, the two functions 
gl = sin 2x and §2 = sin 7: cos x are linearly dependent because the trigonometric identity 
sin 2:t = 2 sin ;ic cos x reveals that gi and g2 are scalar multiples of each other. 



A Geometric Interpretation of Linear Independence 



Linear independence has the following useful geometric interpretations in and Z?-^: 

• Two vectors in /^^ or are linearly independent if and only if they do not lie on the same line when they 
have their initial points at the origin. Otherwise one would be a scalar multiple of the other (Figure 4.3.3). 





y 



{a) Lineariy dependent 



(b) Linearly dependent 



(c) Linearly independent 



Figure 4.3.3 

• Three vectors inp} are linearly independent if and only if they do not lie in the same plane when they have 
their initial points at the origin. Otherwise at least one would be a linear combination of the other two 
(Figure 4.3.4). 




(a) Linearly dependent (b) Linearly dependent (c) Linearly independent 

Figure 4.3.4 

At the beginning of this section we observed that a third coordinate axis in jf?^ is superfluous by showing that 
a unit vector along such an axis would have to be expressible as a linear combination of unit vectors along the 
positive X- andj^-axis. That result is a consequence of the next theorem, which shows that there can be at most 
n vectors in any linearly independent set fv". 

It follows from Theorem 4.3.3, for example, 
that a set in^^ with more than two vectors is 
linearly dependent and a set in with more 
than three vectors is linearly dependent. 

THEOREM 4.3.3 

Let S = {vi, V2, V;.} be a set of vectors inR^. If r > then S is linearly dependent. 

u m 

Proof Suppose that 

VI = (vii,vi2, • ■ ■,vi„) 
V2 = (V21,V22, • • •,V2„) 

Vr = (Vrl,v^2, ' ' ' ^^rn) 

and consider the equation 

If we express both sides of this equation in terms of components and then equate the corresponding 
components, we obtain the system 



Vii^l + V2l/t2 + • • • +Vr\f^r = ^ 
vi2jti +V22*2+ ' ' ' +Vy2jfcr = 0 

Vl.v3.t1 I V2„/t2+ • • • +Vry^kr = 0 

This is a homogeneous system of n equations in the r unknowns fci, kj.. Since ^ > ^, it follows from 
Theorem 1.2.2 that the system has nontrivial solutions. Therefore, S = {vj, V2, Vj.} is a linearly 
dependent set. 



CALCULUS REQUIRED 

Linear Independence of Functions 

Sometimes linear dependence of functions can be deduced from known identities. For example, the functions 

t\ = smx, f2 = cos7:, and f3 = 5 
form a linearly dependent set in F( — cx), 00), since the equation 

5f 1 + 5f2 - f3 = 5sm^x 5cos^x - 5 

= 5 l^sk?x + cos^;c J -5 = 0 

expresses 0 as a linear combination of f 1, f 2, and f 3 with coefficients that are not all zero. 

Unfortunately, there is no general method that can be used to determine whether a set of functions is linearly 
independent or linearly dependent. However, there does exist a theorem that is useful for establishing linear 
independence in certain circumstances. The following definition will be useful for discussing that theorem, 
r n 

DEFINITION 2 

If f J = y J (7:), f 2 = f 2{x), ...Ayi= f yi{x) diVQ functions that are ^ _ 1 times differentiable on the 
interval ( — CX5 , cxd ) , then the determinant 

/iW /2« • • • 



W{x) = 



is called the Wronskian off \, J 2> •-•^ J yi- 



Suppose for the moment that f 1 = / 1 (7:), f 2 = / 2(-^)» f « ^ f niP^) linearly dependent vectors in 
^) ^ — oo, 00 j. This implies that for certain values of the coefficients the vector equation 

fclfl+^2f2+ • • • +^MfM = 0 

has a nontrivial solution, or equivalently that the equation 



fcl/lW+^2/2W+ • • • +W«W=0 

is satisfied for all x in ( — oo, cxd) . Using this equation together with those that result by differentiating it 
^ _ 1 times yields the linear system 

+*2/2W + • • • =0 

+^2/2 (^) + • • • +k„f'„{x) =0 

i : : i 

Thus, the linear dependence of f 1, f 2, f « implies that the linear system 



/iW /2« 







'0' 






0 






: 






0 



(10) 



has a nontrivial solution. But this implies that the determinant of the coefficient matrix of 10 is zero for every 
such X. Since this determinant is the Wronskian of / 1, /2, we have established the following result. 



THEOREM 4.3.4 



If the functions f 1, f 2, f « have « — 1 continuous derivatives on the interval ( — 00, 00), and if the 
Wronskian of these functions is not identically zero on ( — 00, 00) , then these functions form a 

linearly independent set of vectors in 



In Example 8 we showed that x and sin x ^re linearly independent functions by observing that neither is a 
scalar multiple of the other. The following example shows how to obtain the same result using the Wronskian 
(though it is a more complicated procedure in this particular case). 

EXAMPLE 9 Linear Independence Using the Wronskian M 

Use the Wronskian to show that f ^ = x and f 2 = sin x are linearly independent. 
Solution The Wronskian is 







X 


sin 7: 






1- 


1 




= x cos X ^smx 






cos X 





This function is not identically zero on the interval ( — cx), co) since, for example, 

"^(D'^Kf )—(!)=! 

Thus, the functions are linearly independent. 



WARNING 



The converse of Theorem 4.3.4 is false. If the 
Wronskian of f i, f 2, f ^ is identically zero 
on ( — then no conclusion can be 

reached about the linear independence of 
{f 1, f 2, f — this set of vectors may be 
linearly independent or linearly dependent. 

EXAMPLE 10 Linear Independence Using the Wronskian M 

Use the Wronskian to show that f j = 1, f 2 = e^, and f ^ = ^re linearly independent. 
Solution The Wronskian is 





1 




.2' 




W(x) = 


0 










0 









This function is obviously not identically zero on ( — 00, (X)), so f 1, f 2, and f 3 form a linearly 
independent set. 

OPTIONAL 

We will close this section by proving part {a) of Theorem 4.3.1. We will leave the proof of part {b) as an 
exercise. 

Proof of Theorem 4.3. 1(b) Let S' = {y\ , V2, - - } be a set with two or more vectors. If we assume 
that S is linearly dependent, then there are scalars k\, k2, k^, not all zero, such that 

fclvi+*2V2+ ' ' • +*rVr = 0 (11) 
To be specific, suppose that ki^O. Then 11 can be rewritten as 

which expresses as a linear combination of the other vectors in S. Similarly, if itj ^ 0 in 11 for some 
y = 2, 3, r, then is expressible as a linear combination of the other vectors in S. 

Conversely, let us assume that at least one of the vectors in S is expressible as a linear combination of the 
other vectors. To be specific, suppose that 

VI = C2V2 + c:3V3 + • • • + c^Vr 

so 



It follows that S is linearly dependent since the equation 

k\Y\ + jt2V2 + • • • + jtyVy = 0 

is satisfied by 

^1 = 1, A:2=-C2,..., jty=-c^ 

which are not all zero. The proof in the case where some vector other than is expressible as a linear 
combination of the other vectors in S is similar. 



Concept Review 

• Trivial solution 

• Linearly independent set 

• Linearly dependent set 

• Wronskian 

Skills 

• Determine whether a set of vectors is linearly independent or linearly dependent. 

• Express one vector in a linearly dependent set as a linear combination of the other vectors in the set. 
« Use the Wronskian to show that a set of functions is linearly independent. 



Exercise Set 4,3 

1. Explain why the following are linearly dependent sets of vectors. (Solve this problem by inspection.) 

(a) ui = (- 1,2,4) andu2 = (5, -10, -20) mp} 

(b) ui = (3, -l),U2 = (4,5),a3 = (-4,7) mB? 

(c) pi = 3-2;t + ;t^andp2 = 6-4;c + 27:^inP2 



'-3 4" 


and 5 = 


3 


-4" 






0_ 


2 0_ 




-2 



in M22 



Answer: 

(a) U2 is a scalar multiple of . 

(b) The vectors are linearly dependent by Theorem 4.3.3. 

(c) P2 is a scalar multiple of p l . 

(d) 5 is a scalar multiple of ^. 

2. Which of the following sets of vectors in p} are linearly dependent? 
(a) (4. -1.2). (-4.10.2) 



(b) (-3,0,4), (5, -1,2). (1.1,3) 

(c) (8. - 1. 3). (4. 0. 1) 

(d) (-2.0.1), (3.2.5). (6.-1.1). (7.0.-2) 

3. Which of the following sets of vectors in /J^ are linearly dependent? 

(a) (3,8,7, -3), (1,5. 3. - 1), (2, - 1, 2, 6), (1. 4, 0, 3) 

(b) (0,0, 2, 2), (3, 3, 0.0), (1,1.0, -1) 

(c) (0,3, -3. -6),(-2,0,0, -6),(0, -4, -2, -2),(0, -8.4. -4) 

(d) (3, 0, - 3. 6), (0. 2, 3, 1), (0. - 2. - 2. 0), ( - 2, 1, 2, 1) 

Answer: 

None 

4. Which of the following sets of vectors in P2 are linearly dependent? 

(a) 2 -X +4j:^, 3 + 6x + 2 + IOt: -4x^ 

(b) 3 + 7r + 2 -X + 5x2, 4 _ 3^2 

(c) e-x^ 

(d) 1+ 3x + Zx^, X + 4x2, 5 + 6x + 3x2, 7 + 2x -x^ 

5. Assume that vi, V2, and V3 are vectors in that have their initial points at the origin. In each part, 
determine whether the three vectors lie in a plane. 

(a) VI = (2. -2,0),V2 = (6. 1,4),V3=(2.0. -4) 

(b) VI = ( - 6, 7. 2), V2 = (3. 2. 4), V3 = (4. - 1. 2) 

Answer: 

(a) They do not lie in a plane. 

(b) They do lie in a plane. 

6. Assume that Vl, V2, and V3 are vectors in /J^ that have their initial points at the origin. In each part, 
determine whether the three vectors lie on the same line. 

(a) vi = (-1.2.3),V2 = (2, -4. -6), V3 = (-3. 6. 0) 

(b) VI = (2, - 1. 4), V2 = (4. 2. 3), V3 = (2. 7. - 6) 

(c) vi = (4.6,8),V2 = (2.3.4),V3 = (-2. -3. -4) 

(a) Show that the three vectors vj = (0, 3, 1, — 1), V2 = (6, 0, 5, 1), and V3 = (4, — 7, 1, 3) form a 
linearly dependent set in /J^. 

(b) Express each vector in part (a) as a linear combination of the other two. 
Answer: 

(b) VI = :rV2 - ^ V3. V2 = + V3 = - ^vi + ^V2 



^- (a) Show that the three vectors vi = (1, 2, 3, 4), V2 = (0, 1, 0, - 1), and V3 = (1, 3, 3, 3) form a 
linearly dependent set in 

(b) Express each vector in part (a) as a linear combination of the other two. 
9. For which real values of ,\ do the following vectors form a linearly dependent set in f;-'? 



10. Show that if (vj, V2, V3) is a linearly independent set of vectors, then so are 
{vi,V2}, {vi,V3}, {V2,V3}, {vi), {v2},and {V3} . 

11. Show that if 5^= {vi, V2, ) is a linearly independent set of vectors, then so is every nonempty 
subset of S. 

12. Show that if 5^= {v\, V2, V3} is a linearly dependent set of vectors in a vector space F, and V4 is any 
vector in Fthat is not in S, then { vi , Vv, V3, V4 } is also linearly dependent. 

13. Show that if S' = ( vi , V2, . . v^. j is a linearly dependent set of vectors in a vector space F, and if 
v^_^ 1 , . . v„ are any vectors in Fthat are not in S, then ( , V2, . - v^, v^-l-l , . . v„ ) is also linearly 
dependent. 

14. Show that in P2 every set with more than three vectors is linearly dependent. 

15. Show that if (vi, V2) is linearly independent and V3 does not lie in span {vj, V2} , then (vi, V2, V3} is 
linearly independent. 

16. Prove: For any vectors u, v, and w in a vector space F, the vectors q — y? v — w? and iv — u form a 
linearly dependent set. 

17. Prove: The space spanned by two vectors in is a line through the origin, a plane through the origin, or 
the origin itself. 

18. Under what conditions is a set with one vector linearly independent? 

19. Are the vectors vj, V2, and V3 in part (a) of the accompanying figure linearly independent? What about 
those in part (b)7 Explain. 




Answer: 



A=-i A 



1 




(a) 



Figure Ex-19 



Answer: 



(a) They are linearly independent since vi, V2, and V3 do not lie in the same plane when they are placed 
with their initial points at the origin. 

(b) They are not linearly independent since vi, V2, and V3 line in the same plane when they are placed 
with their initial points at the origin. 

20. By using appropriate identities, where required, determine which of the following sets of vectors in 
f ( — cxj, X ) are linearly dependent. 

(a) 6, 3 sin^j:, 2 cos^;f 

(b) cos X 

1 , sin sin 2x 

(d) cos 2x, sin^x, co^x 

(e) (3-;^)^, x^^ex, 5 

(f) 0, COS nx, sin 3ir^ 

21. The functions f i(x)=x and f 2{x)= cos x are linearly independent in F( — 00, 00) because neither 
function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 

Answer: 

W(x) = — sin — cos 0 for some x. 

22. The functions / 1 (7:) = sm x and /2(-) = cos x are linearly independent in F( — "v, because 
neither function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 

23. (Calculus required) Use the Wronskian to show that the following sets of vectors are linearly 
independent. 

(a) 1, X, e"" 

(b) 1, X, x^ 

Answer: 

(a) fF(x)=e**0 

(b) W(x)=2^0 

24. Show that the functions / 1 (^) 

25. Show that the functions f \(x) 
Answer: 

W{x) = 2 sin X gfi 0 for some x. 

26. Use part (a) of Theorem 4.3.1 to prove part (b). 



= /2 (^) = ^^^5 and f2\Xj = x are linearly independent. 

= sin X, fiiP^) — ^5 ^i^d / 3(x) = x cos x are linearly independent. 



27. Prove part (b) of Theorem 4.3.2. 

(a) In Example 1 we showed that the mutually orthogonal vectors i, j, and k form a linearly independent 
set of vectors in Do you think that every set of three nonzero mutually orthogonal vectors in is 
linearly independent? Justify your conclusion with a geometric argument. 

(b) Justify your conclusion with an algebraic argument. [Hint: Use dot products.] 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) A set containing a single vector is linearly independent. 
Answer: 

False 

(b) The set of vectors ( v, ^} is linearly dependent for every scalar k. 
Answer: 

True 

(c) Every linearly dependent set contains the zero vector. 
Answer: 

False 

(d) If the set of vectors (vj, V2, V3} is linearly independent, then {kv\, kv2, ^3} is also linearly 
independent for every nonzero scalar k. 

Answer: 

True 

(e) If VI, v„ are linearly dependent nonzero vectors, then at least one vector V}^ is a unique linear 
combination of vj, v^^i 

Answer: 

True 

(f) The set of 2 x 2 matrices that contain exactly two I's and two O's is a linearly independent set in ^22- 
Answer: 

False 

(g) The three polynomials (x — 'l)(x + 2), x(x + 2), 3ndx(x — 1) are linearly independent. 
Answer: 

True 



(h) The functions / { and / 2 are linearly dependent if there is a real number x so that 
^if 1 (^) + ^iflM = 0 foi* some scalars ki and ^2- 

Answer: 

False 
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4.4 Coordinates and Basis 



We usually think of a line as being one-dimensional, a plane as two-dimensional, and the space around us as three- 
dimensional. It is the primary goal of this section and the next to make this intuitive notion of dimension precise. 
In this section we will discuss coordinate systems in general vector spaces and lay the groundwork for a precise 
definition of dimension in the next section. 



Coordinate Systems in Linear Algebra 



In analytic geometry we learned to use rectangular coordinate systems to create a one-to-one correspondence 
between points in 2-space and ordered pairs of real numbers and between points in 3 -space and ordered triples of 
real numbers (Figure 4.4.1). Although rectangular coordinate systems are common, they are not essential. For 
example. Figure 4.4.2 shows coordinate systems in 2-space and 3-space in which the coordinate axes are not 
mutually perpendicular. 




Coordinates of Tin a rectangular 
coordinate system in 2-space. 



I 
I 

) 



y 



Coordinates of in a rectangular 
coordinate system in 3-space. 



Figure 4.4.1 




P(a. b) 



I 



Coordinates of P in a nonrectangular 
coordinate system in 2-space. 




Coordinates of P in a nonrectangular 
coordinate system in 3-space. 



Figure 4.4.2 

In linear algebra coordinate systems are commonly specified using vectors rather than coordinate axes. For 
example, in Figure 4.4.3 we have recreated the coordinate systems in Figure 4.4.2 by using unit vectors to identify 
the positive directions and then attaching coordinates to a point P using the scalar coefficients in the equations 



OP = aii\ + i?U2 and OP = ani + 6u2 + cii2 



Figure 4.4.3 



Units of measurement are essential ingredients of any coordinate system. In geometry problems one tries to use 
the same unit of measurement on all axes to avoid distorting the shapes of figures. This is less important in 
applications where coordinates represent physical quantities with diverse units (for example, time in seconds on 
one axis and temperature in degrees Celsius on another axis). To allow for this level of generality, we will relax 
the requirement that unit vectors be used to identify the positive directions and require only that those vectors be 
linearly independent. We will refer to these as the "basis vectors" for the coordinate system, hi summary, it is the 
directions of the basis vectors that establish the positive directions, and it is the lengths of the basis vectors that 
establish the spacing between the integer points on the axes (Figure 4.4.4). 





4 

3 
2 










1 




























3 -: 


I -1 


Li 


^^ 


I 2 3 






-2 
-3 














H 


h 


. A. 











Equal spacing 
Perpendicular axes 













I -I 

















X 















Unequal spacing 
Perpendicular axes 




Equal spacing 
Skew axes 



Unequal sp 
Skew axes 



Figure 4.4.4 



Basis for a Vector Space 

The following definition will make the preceding ideas more precise and will enable us to extend the concept of a 
coordinate system to general vector spaces. 

Note that in Definition 1 we have required a basis 
to have finitely many vectors. Some authors call 
this a finite basis, but we will not use this 
terminology. 



r 



DEFINITION 1 



If Vis any vector space and S = {vj, V2, v„) is a finite set of vectors in K, then S is called a basis for 
Kif the following two conditions hold: 

(a) S is linearly independent. 

(b) 5^ spans V. 

L J 



If you think of a basis as describing a coordinate system for a vector space in V, then part (a) of this definition 
guarantees that there is no interrelationship between the basis vectors, and part (b) guarantees that there are 
enough basis vectors to provide coordinates for all vectors in V. Here are some examples. 

EXAMPLE 1 The Standard Basis for < 

Recall from Example 1 1 of Section 4.2 that the standard unit vectors 

ei = (1, 0, 0, 0), 62 = (0, 1, 0, 0), e„ = (0, 0, 0, 1) 

span i?" and from Example 1 of Section 4.3 that they are linearly independent. Thus, they form a 
basis for that we call the standard basis for In particular, 

i= (1,0,0). j= (0,1,0). k= (0.0.1) 

is the standard basis for 



EXAMPLE 2 The Standard Basis for Pn < 

Show that S'= |l, x^, is a basis for the vector space Py^ of polynomials of degree n or 
less. 

Solution We must show that the polynomials in S are linearly independent and span Let us 
denote these polynomials by 

We showed in Example 13 of Section 4.2 that these vectors span P^j and in Example 4 of Section 
4.3 that they are linearly independent. Thus, they form a basis for F„ that we call the standard basis 

forPn 

EXAMPLES Another Basis for < 

Show that the vectors vi = (1, 2, 1), V2 = (2, 9, 0), and V3 = (3, 3, 4) form a basis for p}. 

Solution We must show that these vectors are linearly independent and span R^. To prove linear 
independence we must show that the vector equation 



c 1 vi + C2V2 + <:3V3 = 0 



(1) 



has only the trivial solution; and to prove that the vectors span we must show that every vector 
b = Z>2, ^3) in can be expressed as 



civi + C2V2 + C2Y2 = b 



(2) 



By equating corresponding components on the two sides, these two equations can be expressed as 
the linear systems 

ci + 2c2 + 3c3 = 0 C{+ 2c2 + 3c2 = b\ 

2ci + 9c:2 + 3^3 = 0 and 2c7i + 9^2 + 3c3 = Z?2 (3) 
ci +4c3 = 0 ci +4£:3 = i3 

(verify). Thus, we have reduced the problem to showing that in 3 the homogeneous system has only 
the trivial solution and that the nonhomogeneous system is consistent for all values of ^ , ^nd b2 
. But the two systems have the same coefficient matrix 

"1 2 3" 
.4= 2 9 3 
1 0 4_ 

so it follows from parts (b), (e), and (g) of Theorem 2.3.8 that we can prove both results at the same 
time by showing that det(^) ^ 0. We leave it for you to confirm that det(A) = — 1 , which proves 
that the vectors , V2, and V3 form a basis for /J^. 



EXAMPLE 4 The Standard Basis for /Wat?/? M 



Show that the matrices 



Mi = 



1 0 
0 0 



M2 = 



0 1 
0 0 



M3 



= [: 0} 



M4 = 



0 0 
0 1 



form a basis for the vector space M22 of 2 x 2 matrices. 



Solution We must show that the matrices are hnearly independent and span M22- To prove hnear 
independence we must show that the equation 



C1M1+ C2M2 + C2M2 + C4M4 = 0 



(4) 



has only the trivial solution, where 0 is the 2x2 zero matrix; and to prove that the matrices span 
M22 we must show that every 2x2 matrix 

a b 



B = 



c d 



can be expressed as 



c\M\ + C2M2 + ^73^3 + 4^1/4 = B 



(5) 



The matrix forms of Equations 4 and 5 are 





1 0 


+ C2 


0 1 




0 0 


+ C4 


0 0 




0 0 




0 0 


0 0 


1 0 


0 1 




0 0 



which can be rewritten as 
Since the first equation has only the trivial solution 

the matrices are linearly independent, and since the second equation has the solution 

the matrices span M22- This proves that the matrices Mj, M2, M3, M4 form a basis for M22- 
More generally, the mn different matrices whose entries are zero except for a single entry of 1 form 
a basis for called the standard basis for 




0 0 
0 0 



^1 



C4 



b 
d 



Some writers define the empty set to be a basis 
for the zero vector space, but we will not do so. 



It is not true that every vector space has a basis in the sense of Definition 1 . The simplest example is the zero 
vector space, which contains no linearly independent sets and hence no basis. The following is an example of a 
nonzero vector space that has no basis in the sense of Definition 1 because it cannot be spanned by finitely many 
vectors. 

EXAMPLE 5 A Vector Space That Has No Finite Spanning Set A 

Show that the vector space of of all polynomials with real coefficients has no finite spanning set. 

Solution If there were a finite spanning set, say S= {pi,P2.--. Pr)? then the degrees of the 
polynomials in S would have a maximum value, say n; and this in turn would imply that any linear 
combination of the polynomials in S would have degree at most n. Thus, there would be no way to 
express the polynomial x^'^^ a linear combination of the polynomials in S, contradicting the fact that 
the vectors in S span P^. 



For reasons that will become clear shortly, a vector space that cannot be spanned by finitely many vectors is said 
to be infinite-dimensional, whereas those that can are said to be finite-dimensional. 



EXAMPLE 6 Some Finite-and Infinite-Dimensional Spaces M 



In Example 1, Example 2, and Example 4 we found bases for R^, and Af^^, so these vector 



spaces are finite-dimensional. We showed in Example 5 that the vector space P ^ is not spanned by 
finitely many vectors and hence is infinite-dimensional. In the exercises of this section and the next 
we will ask you to show that the vector spaces ^ , F ( — hxj, oo) , C ( — , (~'^''^)' 
— oo, oo) are infinite-dimensional. 



Coordinates Relative to a Basis 



Earlier in this section we drew an informal analogy between basis vectors and coordinate systems. Our next goal is 
to make this informal idea precise by defining the notion of a coordinate system in a general vector space. The 
following theorem will be our first step in that direction. 



THEOREM 4.4.1 Uniqueness of Basis Representation 

If S' = { VI , V2, . . v„ } is a basis for a vector space V, then every vector xinV can be expressed in the 
form V = civi 4- ^2^2 + • • • c^v^ in exactly one way. 



Proof Since S spans K, it follows from the definition of a spanning set that every vector in Kis expressible as a 
linear combination of the vectors in S. To see that there is only one way to express a vector as a linear combination 
of the vectors in S, suppose that some vector v can be written as 



V = ^ivi + ^2V2 + • • • + ^mV„ 

Subtracting the second equation from the first gives 

0 = (c:i-;ti)vi + (c2-/t2)v2+ • • • +(c:M-/t„)v„ 

Since the right side of this equation is a linear combination of vectors in S, the linear independence of S implies 
that 



v = c:ivi +C2V2+ • • • +<:mVm 



and also as 



ci ^ki = 0, 



0 




^2 — ^2»---» — 




Sometimes it will be desirable to write a 
coordinate vector as a column matrix, in which 
case we will denote it using square brackets as 




We will refer to [ v] as a coordinate matrix and 
reserve the terminology coordinate vector for the 
comma delimited form (v) ^. 



We now have all of the ingredients required to define the notion of "coordinates" in a general vector space V. For 
motivation, observe that inp}, for example, the coordinates {a, b, c) of a vector v are precisely the coefficients in 
the formula 

Y = ai \ b\ \ ck 

that expresses v as a linear combination of the standard basis vectors for p} (see Figure 4.4.5). The following 
definition generalizes this idea. 



DEFINITION 2 

If S' = { vi , V2, . - v„ } is a basis for a vector space K, and 

v^^ivi I C2^2 ♦ • • • +^mV„ 
is the expression for a vector v in terms of the basis S, then the scalars c: i , ^2, - - ^« are called the 
coordinates of v relative to the basis S. The vector {c\, c^, Cy^ in constructed from these 
coordinates is called the coordinate vector of v relative to S; it is denoted by 

(v)^=(ci,C2, (6) 



L 



Remark Recall that two sets are considered to be the same if they have the same members, even if those 



members are written in a different order. However, if S' = { vj , V2, . - v„ } is a set of basis vectors, then changing 
the order in which the vectors are written would change the order of the entries in (v) ^, possibly producing a 
different coordinate vector. To avoid this complication, we will make the convention that in any discussion 
involving a basis S the order of the vectors in S remains fixed. Some authors call a set of basis vectors with this 
restriction an ordered basis. However, we will use this terminology only when emphasis on the order is required 
for clarity. 

Observe that (v) ^ is a vector in so that once basis S is given for a vector space V, Theorem 4.4.1 establishes a 
one-to-one correspondence between vectors in Kand vectors in Pj^ (Figure 4.4.6). 

A one-to-one correspondence 




V 



(V)i 



V 



Figure 4.4.6 



EXAMPLE 7 Coordinates Relative to the Standard Basis for R 



In the special case where V = and S is the standard basis, the coordinate vector (v) ^ and the vector 
V are the same; that is, 



v=Cv)5 



For example, in [(^ the representation of a vector v = {a, b, c) as a linear combination of the vectors in 
the standard basis S= {i, j, k) is 



v = ai + bj + 

so the coordinate vector relative to this basis is (v) (a, b, c), which is the same as the vector v. 



EXAMPLE 8 Coordinate Vectors Relative to Standard Bases M 



(a) Find the coordinate vector for the polynomial 




relative to the standard basis for the vector space 
(b) Find the coordinate vector of 



_ a d 
c d 



relative to the standard basis for M22- 



Solution 



(a) The given formula for p(7:) expresses this polynomial as a linear combination of the standard 
basis vectors . Thus, the coordinate vector for p relative to S is 



(P)^=(<^0.<^1.<^2. ••-.<:«) 

(b) We showed in Example 4 that the representation of a vector 

c d 

as a linear combination of the standard basis vectors is 
5 = 



a b 




1 0 


+ i> 


0 1 




0 0 




0 0 




= a 


0 0 


1 c 


1 0 


c d 




0 0 




0 1 



SO the coordinate vector of B relative to S is 

{B)s={a,b,c,d) 



EXAMPLES Coordinates in < 

(a) We showed in Example 3 that the vectors 

VI = (1,2,1), V2=(2.9.0), V3 = (3,3,4) 

form a basis for p}. Find the coordinate vector of v = (5, —1,9) relative to the basis 
S'= {vi, V2, V3) . 

(b) Find the vector v in whose coordinate vector relative to S is (v)^= ( — 1, 3, 2). 



Solution 

(a) To find (v) ^ we must first express v as a linear combination of the vectors in S\ that is, we must 
find values of c 1 , c:2, and such that 

V = civi H- C2V2 -h <:3V3 

or, in terms of components, 

(5, - 1, 9)=cx(\, 2, 1) +C2(2, 9, 0) +^73(3, 3, 4) 
Equating corresponding components gives 

i:i + 2<:2 + 3c73 = 5 

2^71 + 9^2 + 3^3 = -1 

+4c3 = 9 

Solving this system we obtain = 1,C2= — l,c:3 = 2 (verify). Therefore, 

(y)s=i\. -1.2) 

(b) Using the definition of (v) ^, we obtain 

V = ( — l)vi + 3v2 + 2v3 

= ( - 1) (1, 2, 1) + 3(2, 9, 0) + 2(3, 3, 4) = (11, 31, 7) 



Concept Review 

• Basis 

• Standard bases for f Myan 

• Finite-dimensional 

• Infinite-dimensional 

• Coordinates 

• Coordinate vector 

Skills 

• Show that a set of vectors is a basis for a vector space. 

• Find the coordinates of a vector relative to a basis. 

• Find the coordinate vector of a vector relative to a basis. 



Exercise Set 4.4 



1. In words, explain why the following sets of vectors are not bases for the indicated vector spaces. 

(a) ui = (1. 2), U2 = (0. 3), U3 = (2, 7) for /?2 

(b) ui = (-l,3,2),U2 = (6. l,l)for/?3 

(c) PI = 1 +x + x^,P2 = ^- 1 forP2 



(d) 



1 1 

2 3 



5 = 



6 0 
-1 4 



C = 



3 0 
1 7 



D = 



5 1 
4 2 



7 1 
2 9 



for ilf22 



Answer: 



(a) A basis for has two linearly independent vectors. 

(b) A basis for has three linearly independent vectors. 

(c) A basis for P2 has three linearly independent vectors. 

(d) A basis for M22 has four linearly independent vectors. 

2. Which of the following sets of vectors are bases for p^^7 

(a) {(2.1). (3.0)} 

(b) ((4,1), (-7. -8)) 

(c) {(0.0). (1.3)) 

(d) {(3.9). (-4, -12)) 

3. Which of the following sets of vectors are bases for /J^? 

(a) {(1.0,0). (2. 2,0), (3, 3, 3)) 

(b) {(3.1. -4). (2, 5, 6), (1,4, 8)) 

(c) {(2. -3.1).(4.1.1).(0, -7.1)} 



(d) {(1.6.4). (2.4. -1).(-1.2.5)) 



Answer: 
(a),(b) 

4. Which of the following form bases for 

(a) l-3x + 2x^ H.x + 4x2. l-7x 

(b) 4 + 6x + x^ -H-4x + 2x2. 5 + 2x-x2 

(c) l+x + x^, X + X^, JT^ 

(d) _4 + x + 3x2^ 6 + 5x + 2x^, 8 + 4x + x^ 

5. Show that the following matrices form a basis for Af 22- 

"3 6 



0-1] r 0 -8] r 1 0] 

_1 oj' [-12 -4J' L-1 2J 



3 -6 

6. Let Fbe the space spanned by yj = cos^x» V2 = sm^X' ^3 = cos 2x. 

(a) Show that S = {vi, V2, V3} is not a basis for V. 

(b) Find a basis for F. 

7. Find the coordinate vector of w relative to the basis S = {ui, 112} for /J^. 

(a) ui = (l,0),U2 = (0,l);w=(3. -7) 

(b) ui = (2, -4),U2 = (3.8);w=(l.l) 

(c) ui = (l. l),U2 = (0.2);ir=(fl.i) 

Answer: 

(a) (w)^=(3, -7) 

(0 (w)^= ^) 

8. Find the coordinate vector of w relative to the basis S = {ui, U2) of R^. 

(a) ai = (l, -l),Q2 = (l.l);«r=(1.0) 

(b) ai = (l, -l),U2 = (l,l);«r=(0.1) 

(c) ai = (l, -l),a2 = (l, l);ir=(l, 1) 

9. Find the coordinate vector of v relative to the basis S = {vi, V2, V3} . 

(a) v= (2, - 1. 3); VI = (1. 0. 0), V2 = (2. 2, 0), V3 = (3, 3, 3) 

(b) v= (5, - 12. 3); VI = (1. 2. 3), V2 = ( -4, 5. 6), V3 = a - 8, 9) 

Answer: 



(a) (v)5=(3, -2.1) 

(b) (v)5=(-2,0.1) 



10. Find the coordinate vector of p relative to the basis S= {pj, P2, ps) • 

(a) p = 4-3x + x2;pi = l,P2 = ^,p3=x^ 

(b) p = 2-x + x^;pi = H-x,p2 = H.x^,p3 = x+x^ 

11. Find the coordinate vector of ^ relative to the basis S = {Ai, A2, A^, A4) . 

Answer: 

iA)s=i-hh -1.3) 

In Exercises 12-13, show that {Ai, A2, ^3, ^4) is a basis for M22^ and express ^ as a linear combination of the 
basis vectors. 

Answer: 

In Exercises 14-15, show that (pi, p2, P3} is a basis for P2? express p as a linear combination of the basis 
vectors. 

14. pi = l+2x + ;c2,P2 = 2 + 9:c, pg = 3 + 3x + 4x^; P = 2 + 17j: - 3^^ 

15. pj = 1 +x + x'^^ P2 = x + x^^ P3 = x'^;p = 7-x + 2;r'^ 
Answer: 

P = 7pi-8p2 + 3p3 

16. The accompanying figure shows a rectangular xy-coordinate system and an ;^ '7 '-coordinate system with 
skewed axes. Assuming that 1-unit scales are used on all the axes, find the -coordinates of the points 
whose xy-coordinates are given. 

(a) (1, 1) 

(b) (1,0) 

(c) (0, 1) 

(d) (ab) 




Figure Ex-16 

17. The accompanying figure shows a rectangular xy-coordinate system determined by the unit basis vectors i and 
j and an ;^ '7 '-coordinate system determined by unit basis vectors and U2. Find the -coordinates of the 
points whose xy-coordinates are given. 

(a) (/3, 1) 

(b) (1, 0) 

(c) (0, 1) 

(d) {a, b) 



^ ^ y and y* 



Answer: 





Figure Ex-17 



18. The basis that we gave for M22 Example 4 consisted of noninvertible matrices. Do you think that there is a 
basis for M22 consisting of invertible matrices? Justify your answer. 

19. Prove that Z?*^ is infinite-dimensional. 

True-False Exercises 



In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 



(a) liV = span {vi , . . v„ ) , then { vi , . . v„ } is a basis for V. 



Answer: 



False 

(b) Every linearly independent subset of a vector space Fis a basis for V. 
Answer: 

False 

(c) If { VI , V2, - - v„ } is a basis for a vector space V, then every vector in V can be expressed as a linear 
combination of vj, V2, - 

Answer: 

True 

(d) The coordinate vector of a vector x in relative to the standard basis for is x. 
Answer: 

True 

(e) Every basis of P4 contains at least one polynomial of degree 3 or less. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



4.5 Dimension 



We showed in the previous section that the standard basis has n vectors and hence that the standard basis 
for p} has three vectors, the standard basis for p} has two vectors, and the standard basis for ^ = has one 

vector. Since we think of space as three dimensional, a plane as two dimensional, and a line as one 
dimensional, there seems to be a link between the number of vectors in a basis and the dimension of a vector 
space. We will develop this idea in this section. 



Number of Vectors in a Basis 

Our first goal in this section is to establish the following fundamental theorem. 
THEOREM 4.5.1 

All bases for a fmite-dimensional vector space have the same number of vectors. 



n 



□ 



To prove this theorem we will need the following preliminary result, whose proof is deferred to the end of the 
section. 



□ 



THEOREM 4.5.2 

Let Fbe a fmite-dimensional vector space, and let {vj, V2, v„} be any basis. 

(a) If a set has more than n vectors, then it is linearly dependent. 

(b) If a set has fewer than n vectors, then it does not span V. 



Some writers regard the empty set to be a basis 
for the zero vector space. This is consistent with 
our definition of dimension, since the empty set 
has no vectors and the zero vector space has 
dimension zero. 



We can now see rather easily why Theorem 4.5.1 is true; for if 

S= {vi, V2,.--, v„} 

is an arbitrary basis for F, then the linear independence of S implies that any set in F with more than n vectors 
is linearly dependent and any set in F with fewer than n vectors does not span V. Thus, unless a set in Fhas 
exactly n vectors it cannot be a basis. 



We noted in the introduction to this section that for certain familiar vector spaces the intuitive notion of 
dimension coincides with the number of vectors in a basis. The following definition makes this idea precise. 



r 



Engineers often use the term degrees of 
freedom as a synonym for dimension. 



DEFINITION 1 

The dimension of a finite-dimensional vector space Fis denoted by dim(^) and is defined to be the 
number of vectors in a basis for V. In addition, the zero vector space is defined to have dimension zero. 



EXAMPLE 1 Dimensions of Some Familiar Vector Spaces A 

dim^/?"^ = n The standard basis has n vectors. 

<iim(/^„) = « + 1 The standard basis has n } 1 vectors. 
dim(ilif = mn The standai d basis has mn vectors. 



EXAMPLE 2 Dimension of Span(S) < 

liS= { VI , V2, - - v^ } is a linearly independent set in a vector space F, then S is automatically 
a basis for span(5) (why?), and this implies that 

dim[span(5)] =r 

In words, the dimension of the space spanned by a linearly independent set of vectors is equal to 
the number of vectors in that set. 



EXAMPLE 3 Dimension of a Solution Space ^ 

Find a basis for and the dimension of the solution space of the homogeneous system 

"^1 " ^2 + 2:^3 " 3^4 + ^5 = 0 
^1+^2 ""2:^3 — X5 = 0 

+ 7:4 + 7:5 = 0 

Solution We leave it for you to solve this system by Gauss-Jordan elimination and show that 
its general solution is 

xi = ^s^t, X2 = s, 7:3=— ^, 7:4 = 0, x^=t 



which can be written in vector form as 

or, alternatively, as 

(xi,X2,X3,X4,7:5)=5(-l, l,0,0,0)+^(-l,0, -1,0, 1) 

This shows that the vectors = ( — 1, 1, 0, 0, 0) and V2 = ( — 1, 0, — 1, 0, 1) span the 
solution space. Since neither vector is a scalar multiple of the other, they are linearly independent 
and hence form a basis for the solution space. Thus, the solution space has dimension 2. 



EXAMPLE 4 Dimension of a Solution Space M 

Find a basis for and the dimension of the solution space of the homogeneous system 

A"! + 3^-2 — 27:3 +27:5 =0 
27:1 + 67:2 — 57:3 — 27:4 + 47:5 — 3x^ = 0 
57:3+ 10x4 1 157:6 = 0 
2x1 + ^^2 + 8x4 + Ax^ + 18x6 = 0 

Solution In Example 6 of Section 1 .2 we found the solution of this system to be 

x\ = —3r — 4s — 2t, X2 = r, 7:3 =— 2^, 7:4 = 5, x^ = t, 7:6 = 0 
which can be written in vector form as 

(^1,7:2, 7:3, 7:4, 7::5, X6) = ( - 3r-45- 2i:, r, -2s,s,t, 0) 

or, alternatively, as 

(xi,X2,X3,X4,X5)=r(«3, 1,0, 0,0,0) + 5( -4,0, -2, 1, 0, 0) - 2, 0, 0, 0, 1,0) 

This shows that the vectors 

vi = (-3, 1,0,0,0,0), V2=(-4,0, -2, 1,0,0), V3 - ( - 2, 0, 0, 0, 1, 0) 

Span the solution space. We leave it for you to check that these vectors are linearly independent 
by showing that none of them is a linear combination of the other two (but see the remark that 
follows). Thus, the solution space has dimension 3. 



Remark It can be shown that for a homogeneous linear system, the method of the last example always 
produces a basis for the solution space of the system. We omit the formal proof. 



Some Fundamental Theorems 

We will devote the remainder of this section to a series of theorems that reveal the subtle interrelationships 
among the concepts of linear independence, basis, and dimension. These theorems are not simply exercises in 
mathematical theory — ^they are essential to the understanding of vector spaces and the applications that build 
on them. 



We will start with a theorem (proved at the end of this section) that is concerned with the effect on linear 
independence and spanning if a vector is added to or removed from a given nonempty set of vectors. 
Informally stated, if you start with a linearly independent set S and adjoin to it a vector that is not a linear 
combination of those in then the enlarged set will still be linearly independent. Also, if you start with a set S 
of two or more vectors in which one of the vectors is a linear combination of the others, then that vector can be 
removed from S without affecting span(S) (Figure 4.5.1). 

The vector outside the plane Any of the vectors can Either of the col linear 

can be adjoined to the other be removed, and the vectors can be removed, 

two without affecting their remaining two will still and the remaining two 

linear independence. span the plane. will still span the plane. 

Figure 4.5.1 

n 

THEOREM 4.5.3 Plus/Minus Theorem 

Let Shea, nonempty set of vectors in a vector space V. 

(a) If is a linearly independent set, and if v is a vector in Fthat is outside of span(JT) , then the set 
Sv \ (v) that results by inserting v into S is still linearly independent. 

(b) If V is a vector in S that is expressible as a linear combination of other vectors in S, and if S' — ( v) 
denotes the set obtained by removing v from S, then S^— {v} span the same space; that is, 

span(5^ = span(2^ — {v} ) 

n 

EXAMPLE 5 Applying the Plus/Minus Theorem A 

Show that p J = 1 — x^-> P2 = 2 — x^? P3 = ^re linearly independent vectors. 

Solution The set S'= {pi, P2) is linearly independent, since neither vector in S is a scalar 
multiple of the other. Since the vector P3 cannot be expressed as a linear combination of the 
vectors in S (why?), it can be adjoined to S to produce a linearly independent set 
S^' = {Pl.P2,P3}. 



In general, to show that a set of vectors ( vi , V2, . - v„ } is a basis for a vector space F, we must show that the 
vectors are linearly independent and span V. However, if we happen to know that Fhas dimension n (so that 
(vi, V2, v„} contains the right number of vectors for a basis), then it suffices to check either linear 




independence or spanning — the remaining condition will hold automatically. This is the content of the 
following theorem. 

□ 

THEOREM 4.5,4 

Let Fbe an n-dimensional vector space, and let 5* be a set in F with exactly n vectors. Then 5* is a basis 
for Fif and only if S spans For 5* is linearly independent. 

□ 

Proof Assume that S has exactly n vectors and spans F. To prove that iS is a basis, we must show that 5* is a 
linearly independent set. But if this is not so, then some vector v in 5^ is a linear combination of the remaining 
vectors. If we remove this vector from S, then it follows from Theorem 4.5.3Z? that the remaining set of ^ — 1 
vectors still spans F. But this is impossible, since it follows from Theorem 4.5.2Z? that no set with fewer than n 
vectors can span an ^-dimensional vector space. Thus S is linearly independent. 

Assume that S has exactly n vectors and is a linearly independent set. To prove that 5* is a basis, we must show 
that S spans F. But if this is not so, then there is some vector v in Fthat is not in span(2^ . If we insert this 
vector into S, then it follows from Theorem 4.5.3a that this set of ^ -| 1 vectors is still linearly independent. 
But this is impossible, since Theorem 4.5.2a states that no set with more than n vectors in an ^-dimensional 
vector space can be linearly independent. Thus S spans F. 

EXAMPLE 6 Bases by Inspection A 

(a) By inspection, explain why = ( — 3, 7) and V2 = (5, 5) form a basis for /J^ 

(b) By inspection, explain why vi = (2, 0, — 1), V2 = (4, 0, 7), and V3 = ( — 1, 1, 4) form a 
basis fox p}. 

Solution 

(a) Since neither vector is a scalar multiple of the other, the two vectors form a linearly 
independent set in the two-dimensional space and hence they form a basis by Theorem 
4.5.4. 

(b) The vectors and V2 form a linearly independent set in the xz-plane (why?). The vector V3 
is outside of the xz-plane, so the set {vj, V2, V3} is also linearly independent. Since is 
three-dimensional. Theorem 4.5.4 implies that {v\, V2, V3} is a basis for /J^. 

The next theorem (whose proof is deferred to the end of this section) reveals two important facts about the 
vectors in a finite-dimensional vector space F: 

1. Every spanning set for a subspace is either a basis for that subspace or has a basis as a subset. 

2. Every linearly independent set in a subspace is either a basis for that subspace or can be extended to a basis 
for it. 



THEOREM 4.5.5 



Let be a finite set of vectors in a finite-dimensional vector space V. 

(a) If S spans Fbut is not a basis for F, then S can be reduced to a basis for Vhy removing appropriate 
vectors from S. 

(b) If is a linearly independent set that is not already a basis for F, then S can be enlarged to a basis 
for Fby inserting appropriate vectors into S. 

^ o 

We conclude this section with a theorem that relates the dimension of a vector space to the dimensions of its 
subspaces. 

a 

THEOREM 4.5.6 

If ^ is a subspace of a finite-dimensional vector space F, then: 

(a) JF is finite-dimensional. 

(b) clim({r)<dim(n. 

(c) =f^ifandonly if dlm(^^) = dim(r). 

Proof (a) We will leave the proof of this part for the exercises. 
Proof (b) Part (a) shows that W is finite-dimensional, so it has a basis 

S= {wi,W2,---,w^} 

Either S is also a basis for For it is not. If so, then dim(f^) = m, which means that dim(f^) = dimff^ . Ifnot, 
then because 5* is a linearly independent set it can be enlarged to a basis for Fby part (b) of Theorem 4.5.5. But 
this implies that dka(lV) - diiri(f^), so we have shown that dim(I?'') < dim(F) in all cases. 

Proof (c) Assume that <Mi(lV) = dim(f^) and that 

S= {wi,W2,.-.,w^} 

is a basis for W. If S is not also a basis for F, then being linearly independent S can be extended to a basis for F 
by part (b) of Theorem 4.5.5. But this would mean that dim(f^) > dim(fF), which contradicts our hypothesis. 
Thus S must also be a basis for F, which means that dim(JF) = dim(f^) . 

Figure 4.5.2 illustrates the geometric relationship between the subspaces of fi^ in order of increasing 
dimension. 



The origin 
(Q-dimcnsional) 




Line IhixHigh the origin 

( I -dimensional) 



Figure 4.5.2 



Plane through 
the origin 
I (2-dimensional) 



A' 

(3-iiimcnjiional) 



OPTIONAL 

We conclude this section with optional proofs of Theorem 4.5.2, Theorem 4.5.3, and Theorem 4.5.5. 

Proof of Theorem 4.5.2(a) Let = |wi , W2, . . ., | be any set of m vectors in F, where > We 
want to show that S' is linearly dependent. Since S = {v\, V2, - v^} is a basis, each can be expressed as a 
linear combination of the vectors in S, say 



wi=d(iivi+d(2iV2+ • ■ • +<3„iv„ 
W2=tafi2Vi +taf22V2+ • • " +<ajM2VM 

To show that S' is linearly dependent, we must find scalars ,ti , - - -^m? zero, such that 



(1) 



(2) 



Using the equations in 1 , we can rewrite 2 as 

+ {kia2l+k2a22+ ' ' ' +km<^2m>2 

Thus, from the linear independence of S, the problem of proving that 5" is a linearly dependent set reduces to 
showing there are scalars ki, ky^^, not all zero, that satisfy 

^11^1 +<3i2*2 i • ■ ■ ^^lmkm = 0 
'^21^1+^22*2+ • • • +^2mkm = 0 

<^n\k\+ayi2k2+ ' ' ' +^«w2*wj = 0 

But 3 has more unknowns than equations, so the proof is complete since Theorem 1.2.2 guarantees the 
existence of nontrivial solutions. 

Proof of Theorem 4.5.2(b) Let = |wi, W2, w^^jj be any set of m vectors in F, where < We 
want to show that S' does not span V. We will do this by showing that the assumption that S' spans F leads to a 
contradiction of the linear independence of (vj, V2, - v„} . If 5" spans F, then every vector in Fis a linear 
combination of the vectors in S\ In particular, each basis vector Vy is a linear combination of the vectors in S\ 



(3) 



say 



vi=aiiwi+(a(2iW2+ • • • +«atmiw„ 

V2 = ai2Wi+(a(22W2+ • • • +am2'Wm ... 

m m m m \^ / 

■ 8 S 8 

v„ = ^a(i„wi+df2„W2+ • • • +<3^«w^ 

To obtain our contradiction, we will show that there are scalars k\, k2, zqyo, such that 

itivi + k2V2 + • • • + jt„v„ = 0 (5) 

But 4 and 5 have the same form as 1 and 2 except that m and n are interchanged and the w's and v's are 
interchanged. Thus, the computations that led to 3 now yield 

a2\ki + <a(22^2 + ' " " + ^2n^yi = 0 

ayn\k\ + ayn2h + ' ' ' + ^mn>i^« = 0 

This linear system has more unknowns than equations and hence has nontrivial solutions by Theorem 1.2.2. 

Proof of Theorem 4.5.3(a) Assume that S = {vj, V2, v^.} is a linearly independent set of vectors in F, 
and V is a vector in F outside of span (5^ . To show that = |vi, V2, v^., v| is a linearly independent set, 
we must show that the only scalars that satisfy 



k\Y\ + ^2V2 + • • • + kyYy + jty+iv = 0 (6) 

are ^1 = ^2 = " ' " = = = it must be true that = 0 for otherwise we could solve 6 for v 

as a linear combination of y\, V2, v^, contradicting the assumption that v is outside of span (5') . Thus, 6 
simphfies to 

A:ivi+Ar2V2+ • • • +*yVy = 0 (7) 

which, by the linear independence of { vi, V2, v^.) , implies that 

ki=k2= ' • • =ky = 0 

Proof Theorem 4.5.3(b) Assume that S = {vi, V2, v^} is a set of vectors in F, and (to be specific) 
suppose that Vj. is a linear combination of v\, V2, Vy_i, say 



Vy=civi+C2V2+ • • • +Cy_iVy_i (8) 

We want to show that if is removed from S, then the remaining set of vectors {vi, V2, Vy_i } still spans 
S; that is, we must show that every vector w in spanf^O is expressible as a linear combination of 
{ VI , V2, - - _1 } . But if w is in span(;S') , then w is expressible in the form 

w = kivi+k2V2-^ ' ' ' ^ ky-iVr-i + kyVy 

or, on substituting 8, 



w=jfcivi H-it2V2+ ■ ■ • +ity_iVj._i +^y((:ivi +C2V2+ • • • +i:y_iVy_i) 

which expresses w as a linear combination of vi, V2, ^r-l- 

Proof of Theorem 4.5.5(a) If 5* is a set of vectors that spans Fbut is not a basis for F, then 5^ is a linearly 
dependent set. Thus some vector v in /S* is expressible as a linear combination of the other vectors in S. By the 
Plus/Minus Theorem (4.53b), we can remove v from S, and the resulting set 5" will still span V. If 5" is linearly 
independent, then S' is a basis for F, and we are done. If S' is linearly dependent, then we can remove some 
appropriate vector from S' to produce a set 5"' that still spans V. We can continue removing vectors in this way 
until we finally arrive at a set of vectors in S that is linearly independent and spans V. This subset of 5* is a basis 
for V. 

Proof of Theorem 4.5.5(b) Suppose that <&xi(V) = If 5* is a linearly independent set that is not already a 
basis for F, then S fails to span F, so there is some vector v in Fthat is not in spcai(£r) . By the Plus/Minus 
Theorem (4.5.3a), we can insert v into S, and the resulting set 5" will still be linearly independent. If 5" spans F, 
then S' is a basis for F, and we are finished. If S' does not span F, then we can insert an appropriate vector into 
y to produce a set S^' that is still linearly independent. We can continue inserting vectors in this way until we 
reach a set with n linearly independent vectors in F. This set will be a basis for Fby Theorem 4.5.4. 



Concept Review 

• Dimension 

• Relationships among the concepts of linear independence, basis, and dimension 

Skills 

• Find a basis for and the dimension of the solution space of a homogeneous linear system. 

« Use dimension to determine whether a set of vectors is a basis for a finite-dimensional vector space. 

• Extend a linearly independent set to a basis. 



Exercise Set 4.5 

In Exercises 1-6, find a basis for the solution space of the homogeneous linear system, and find the 
dimension of that space. 

1. XI+X2- X2 = 0 
— 2x\ ^X2 + 2x2 = 0 
-x\ + X2 = 0 

Answer: 



Basis: (1, 0, 1); dimension = 1 



2. 3X1 +^2 + ^3 + ^4 = 0 

3. TTi -4;c2 + 3:t3- ^4=0 
2x1 - 8:^2 + 6x3 - 2x4 = 0 

Answer: 

Basis: (4, 1,0,0), (-3,0,1,0), (1, 0, 0, 1); dimension = 3 

4. — 3x2 ♦ = 0 
2x1 — 67:2 } 2x2 = ^ 
3x1-9x2+ 3x3 = 0 

5. 2x1 + ^2 i 3x3 = 0 

XI +5x3 = 0 
X2+ X3 = 0 

Answer: 

No basis; dimension = 0 

6. x+ y+ z = 0 
3x + 2j|/-2z = 0 
4x + 3y^ z = 0 
6x + 5;;+ z = 0 

7. Find bases for the following subspaces of R^. 

(a) The plane 3j:-2ji^ + 5z = 0. 

(b) The plane ;r = 0- 

(c) The line ;^ = 2^, 7 = -^,z = 4^ 

(d) All vectors of the form (a,b,c), where b = a+C' 

Answer: 



8. Find the dimensions of the following subspaces of R^, 

(a) All vectors of the form (a, b, c, 0). 

(b) All vectors of the form (a, b, c, d) , where d = a + b and c=a^b' 

(c) All vectors ofthe form (<af, 6, c, d) , where a = b=c = d' 

9. Find the dimension of each ofthe following vector spaces, 
(a) The vector space of all diagonal » x » matrices. 




(b) (1,1,0), (0,0, 1) 

(c) (2, -1,4) 

(d) (1,1,0), (0,1,1) 



(b) The vector space of all symmetric w x « matrices. 

(c) The vector space of all upper triangular » x » matrices. 

Answer: 

(a) « 

(b) »(» + 1) 

2 

(c) + 
2 

10. Find the dimension of the subspace of P3 consisting of all polynomials ^ ^ix + a-j^ + for which 

flo = o. 

(a) Show that the set W oi all polynomials in P2 such that ;?(l) = Oisa subspace of ^2- 

(b) Make a conjecture about the dimension of W. 

(c) Confinn your conjecture by finding a basis for W. 

12. Find a standard basis vector for ^ that can be added to the set {y\, V2} to produce a basis for 

(a) vi = (-l,2,3), V2=(l. -2, -2) 

(b) vi = (l. -1.0). V2 = (3. 1.-2) 

13. Find standard basis vectors for ^ that can be added to the set {vi, V2} to produce a basis for /J^. 

vi = (l. -4.2, -3), T2=(-3,8. -4,6) 

Answer: 

Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 

14. Let {vi, V2, V3} be a basis for a vector space V, Show that {ui, U2, U3} is also a basis, where ui = vj, 
112 = VI H- V2, and U3 = Vi H- V2 + V3. 

15. The vectors vi = (1, — 2, 3) and V2 = (0, 5, — 3) are linearly independent. Enlarge {vi, V2} to a basis 
fors3. 

Answer: 

V3 = {a, i, c) with 9(3 - 3A - 5*: ic 0 

16. The vectors vi = (1, — 2, 3, — 5) and V2 = (0, — 1, 2, — 3) are linearly independent. Enlarge 
{vi, V2} to a basis for /J^. 

(a) Show that for every positive integer n, one can find w + 1 linearly independent vectors in f ( — oo, oo) 
. [Hint: Look for polynomials.] 

(b) Use the result inpart (a) to prove that F ( — 00, oo) is infinite- dimensional. 

(c) Prove that — go, ooJ, ( — cx), ooJ^ and C"( — oo, oo) are infinite-dimensional vector spaces. 



18. Let 5* be a basis for an ^-dimensional vector space V. Show that if vi , V2, . . form a linearly 
independent set of vectors in V, then the coordinate vectors (vi) (V2) ^, (v,.) ^ form a linearly 
independent set in fi", and conversely. 



19. Using the notation from Exercise 18, show that if the vectors vj, V2, v^. span V, then the coordinate 
vectors (vi)^, (v2) ^, (v^) s ^P^^ conversely. 

20. Find a basis for the subspace of P2 spanned by the given vectors. 

(a) _i+;^-2;cl3 + 3x + 6;tl9 

(b) \+x,x^,-2 + 2x^.^3x 

(c) 1 + ;r - 3x^, 2^2x- 6;rl 3~\-3x- 9x^ 

[Hint: Let S be the standard basis for F2? ^iid work with the coordinate vectors relative to S as in Exercises 
18 and 19.] 

21. Prove: A subspace of a finite-dimensional vector space is finite-dimensional. 

22. State the two parts of Theorem 4.5.2 in contrapositive form. 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) The zero vector space has dimension zero. 
Answer: 

True 

(b) There is a set of 17 linearly independent vectors in 
Answer: 

True 

(c) There is a set of 11 vectors that span 
Answer: 

False 

(d) Every linearly independent set of five vectors in is a basis for 
Answer: 

True 

(e) Every set of five vectors that spans p^^ is a basis for 
Answer: 

True 

(f) Every set of vectors that spans contains a basis for 
Answer: 

True 



(g) Every linearly independent set of vectors in fi" is contained in some basis for fi". 
Answer: 

True 

(h) There is a basis for M22 consisting of invertible matrices. 
Answer: 

True 

(i\ 2 f 2 M^l 

^ ^ If ^ has size nxn and ^ ^2 are distinct matrices, then 11^, A, A A \ is linearly 

dependent. 
Answer: 

True 

(j) There are at least two distinct three-dimensional subspaces of /'2- 
Answer: 
False 
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4.6 Change of Basis 

A basis that is suitable for one problem may not be suitable for another, so it is a common process in the study 
of vector spaces to change from one basis to another. Because a basis is the vector space generalization of a 
coordinate system, changing bases is akin to changing coordinate axes in and R^.In this section we will 
study problems related to change of basis. 



Coordinate Maps 



lfS= {vi, V2, v„) is a basis for a fmite-dimensional vector space F, and if 

(v)^= (ci,C2,-.-, c„) 

is the coordinate vector of v relative to then, as observed in Section 4.4 , the mapping 



(1) 



creates a connection (a one-to-one correspondence) between vectors in tlie general vector space V and vectors 
in the familiar vector space R^. We call 1 the coordinate map from Vio R^. In this section we will find it 
convenient to express coordinate vectors in the matrix form 

'^1 



C2 



(2) 



where the square brackets emphasize the matrix notation (Figure 4.6.1). 

Cooroinate map 




Figure 4.6.1 



Change of Basis 

There are many applications in which it is necessary to work with more than one coordinate system. In such 
cases it becomes important to know how the coordinates of a fixed vector relative to each coordinate system 
are related. This leads to the following problem. 

n 



The Change-of-Basis Problem 



If V is a vector in a finite-dimensional vector space V, and if we change the basis for V from a basis B 
to a basis B\ how are the coordinate vectors [ v] ^ and [v] 



Remark To solve this problem, it will be convenient to refer to B as the "old basis" and B' as the "new 
basis." Thus, our objective is to find a relationship between the old and new coordinates of a fixed vector v in 
V, 



For simplicity, we will solve this problem for two-dimensional spaces. The solution for ^-dimensional spaces 
is similar. Let 



5={ui,U2} and 5'={u;,u^} 



be the old and new bases, respectively. We will need the coordinate vectors for the new basis vectors relative 
to the old basis. Suppose they are 



["'.L=[:] [4=[^] 



That is. 



U2 = cn\ + dn2 



Now let V be any vector in V, and let 



be the new coordinate vector, so that 



[v]b' = 



*1 



(3) 



(4) 



(5) 



(6) 



In order to find the old coordinates of v, we must express v in terms of the old basis B. To do this, we 
substitute 4 into 6. This yields 



or 



Thus, the old coordinate vector for v is 

k\a + k2C 



[v]b = 



k\b + k2d 



which, by using 5, can be written as 



[y]B- 



c' 






'a c' 








b d_ 



This equation states that the old coordinate vector [v] ^ resuhs when we muMply the new coordinate vector 
[v] £f on the left by the matrix 

c~ 
d 



Since the columns of this matrix are the coordinates of the new basis vectors relative to the old basis [see 3] 
we have the following solution of the change-of-basis problem. 



Solution of the Change-of-Basis Problem 

If we change the basis for a vector space V from an old basis 5 = {ui , U2, . . ., u„ } to a new basis 
5 = » ^2 ' - - ^« } ? then for each vector v in F, the old coordinate vector [ v] ^ is related to the 
new coordinate vector [v] £' by the equation 

Mb = P[y]s* (7) 

where the columns of P are the coordinate vectors of the new basis vectors relative to the old basis; 
that is, the column vectors of P are 



[Mb- {Mb- [Mb 



(8) 



Transition Matrices 

The matrix P in Equation 7 is called the transition matrix from B' to B. For emphasis, we will often denote it 
by Pb'—*B follows from 8 that this matrix can be expressed in terms of its column vectors as 



Similarly, the transition matrix from Bio B' can be expressed in terms of its column vectors as 



(9) 



(10) 



Remark There is a simple way to remember both of these formulas using the terms "old basis" and "new 
basis" defined earlier in this section: In Formula 9 the old basis is B' and the new basis is B, whereas in 
Formula 10 the old basis is B and the new basis is 5''. Thus, both formulas can be restated as follows: 



The columns of the transition matrix from an old basis to a new basis are the coordinate vectors of the 
old basis relative to the new basis. 



EXAMPLE 1 Finding Transition Matrices A 

Consider the bases 5 = {uj , U2 } and ^' = ^\ > ^2 } for fH, where 

ui = (1.0), U2=(0,l). u'i = (l,l), u^ = (2,l) 

(a) Find the transition matrix Pb'^B ^^^^ 5' to B. 

(b) Find the transition matrix P£_^b* ^^^^ ^ 



Solution 

(a) Here the old basis vectors are u'^ and U2 and the new basis vectors are and U2. We want 
to find the coordinate matrices of the old basis vectors u'^ and U2 relative to the new basis 
vectors and U2. To do this, first we observe that 

n\ = ui + U2 

U2 = 2ui + U2 



from which it follows that 



and hence that 



1 2 
1 1 



(b) Here the old basis vectors are and U2 and the new basis vectors are u'j and . As in part 
(a), we want to find the coordinate matrices of the old basis vectors u'j and U2 relative to 
the new basis vectors and U2. To do this, observe that 

ui= -u'l 
U2 = 2u} -u^ 



from which it follows that 

Mb' = 

and hence that 



-1 
1 



and [u2]b 



-[-;] 



Suppose now that B and 5' are bases for a finite-dimensional vector space V. Since multiplication by Pq'^s 
maps coordinate vectors relative to the basis B' into coordinate vectors relative to a basis B, and P£ maps 
coordinate vectors relative to B into coordinate vectors relative to 5', it follows that for every vector v in F 
we have 



[v]B' = PB-,B'\y^B 



(11) 



(12) 



EXAMPLE 2 Computing Coordinate Vectors A 

Let B and B' be the bases in Example 1 . Use an appropriate formula to find [ v] ^ given that 



Mb' = 



-3 

5 



Solution To find [ v] we need to make the transition from B' to B. It follows from Formula 
1 1 and part (a) of Example 1 that 

[v]b = Ps'^b['^]b' = 



'1 2" 


'-3' 




'T 


1 1_ 


5_ 




_2_ 



Invertibility of Transition Matrices 

If B and B' are bases for a finite-dimensional vector space V, then 

{Pb'^b)(.Pb^b')=Pb^b 

because multiplication by {Pq* ^q) {Pq first maps 5-coordinates of a vector into 5 '-coordinates, and 
then maps those ^'-coordinates back into the original 5-coordinates. Since the net effect of the two operations 
is to leave each coordinate vector unchanged, we are led to conclude that P £_^£ must be the identity matrix, 
that is. 



iPB'^B)(PB^B')=I 



(13) 



(we omit the formal proof). For example, for the transition matrices obtained in Example 1 we have 



"1 2" 


2' 




"1 0' 


1 1_ 


1 -1_ 




0 1_ 



=/ 



It follows from 13 that Pq^^^q is invertible and that its inverse is Pq^q ' Thus, we have the following 
theorem. 



y 



THEOREM 4.6.1 

If P is the transition matrix from a basis 5' to a basis B for a finite-dimensional vector space F, then P 
is invertible and p ~^ is the transition matrix from 5 to 5'. 



>Aa? Efficient IVIetliod for Computing Transition Matrices for 

Our next objective is to develop an efficient procedure for computing transition matrices between bases for 
R^. As illustrated in Example 1, the first step in computing a transition matrix is to express each new basis 
vector as a linear combination of the old basis vectors. For R^ this involves solving n linear systems of n 
equations in n unknowns, each of which has the same coefficient matrix (why?). An efficient way to do this is 
by the method illustrated in Example 2 of Section 1.6, which is as follows: 

n 



A Procedure for Computing Pb b' 

Step 1 Form the matrix [^'p]- 

Step 2 Use elementary row operations to reduce the matrix in Step 1 to reduced row echelon form. 
Step 3 The resulting matrix will be [I\Pb 

Step 4 Extract the matrix ' from the right side of the matrix in Step 3. 



This procedure is captured in the following diagram. 

row operations 

[ new b asis | old b asis ] [I |tr ansition from old to new ] \^^) 

EXAMPLE 3 Example 1 Revisited M 

In Example 1 we considered the bases B = {ui, U2} and 5' = ^2'} for R^, where 
ui = (l,0), U2=(0,l), ui'=(l,l), U2'=(2.1) 

(a) Use Formula 14 to find the transition matrix from B' to B. 

(b) Use Formula 14 to find the transition matrix from 5 to 5'. 



Solution 



(a) Here B' is the old basis and B is the new basis, so 

[new basis I old basis] = 



Since the left side is already the identity matrix, no reduction is needed. We see by 
inspection that the transition matrix is 

"1 2 



'\ 0 


1 2 


0 1 


1 1 



1 1 



1 2 


1 0" 


1 1 


0 1 



which agrees with the result in Example 1 . 
(b) Here B is the old basis and is the new basis, so 

[new basis |old basis] = 

By reducing this matrix, so the left side becomes the identity we obtain (verify) 
[/[transition from old to new] = 

so the transition matrix is 
which also agrees with the result in Example 1 . 



1 


0 


-1 2" 


0 


1 


1 -1 


2 






-1 







Transition to the Standard Basis for R'^ 

Note that in part (a) of the last example the column vectors of the matrix that made the transition from the 
basis 5' to the standard basis turned out to be the vectors in 5' written in column form. This illustrates the 
following general result. 



THEOREM 4.6.2 

Let 5' = |ui , U2, . . u„ I beany basis for the vector space and let S' = { e j , 62, - - e„ ) be the 

standard basis for R^'\ If the vectors in these bases are written in column form, then 



(15) 



It follows from this theorem that if 

A= [ui|U2|- • • |u„] 



is any invertible « x » matrix, then A can be viewed as the transition matrix from the basis {uj, U2, 
for /?" to the standard basis for R". Thus, for example, the matrix 

1 2 3" 



A = 



2 5 3 
1 0 8 



which was shown to be invertible in Example 4 of Section 1 .5, is the transition matrix from the basis 

ai = (1,2,1), U2=(2,5,0), U3=(3,3,8) 

to the basis 

ei = (1,0,0), 62 =(0,1,0), 63 = (0,0,1) 



Concept Review 

• Coordinate map 

• Change-of-basis problem 

• Transition matrix 

Skills 

• Find coordinate vectors relative to a given basis directly. 

• Find the transition matrix from one basis to another. 

• Use the transition matrix to compute coordinate vectors. 



Exercise Set 4.6 

1. Find the coordinate vector for w relative to the basis S = {ui, U2} for /J^ 

(a) ui = (1.0), U2=(0. 1); w= (3, -7) 

(b) ai = (2, -4). U2 = (3,8); w= (1, 1) 

(c) ui = (l, 1), U2 = (0,2); w=(a,b) 



Answer: 



(a) 



(b) 



Ms 



-[i] 



5_ 
28 
J_ 
14 



(c) 



a 

b-a 

2 



2. Find the coordinate vector for v relative to the basis S= { vi , V2, V3) for 

(a) v= (2, - 1, 3); VI = (1, 0, 0), V2 = (2, 2, 0), V3 = (3, 3, 3) 

(b) v=(5, -12,3);vi = (1.2.3),V2 = (-4,5.6).V3 = (7, -8.9) 

3. Find the coordinate vector for p relative to the basis S= {pi, P2, P3) for p^. 
(a) p = 4-3x + jr2;pi = l,p2 = x,p3=x2 

PI 



(b) p = 2-x + ;v^;pi = l+x,p2 = l+;flp3 = x+j:^ 



Answer: 

(a) 



(p)/r=(4. -3,1), [p]s= 



(b) 



(P)^=(0.2, -1), [p]5= 



4 
—3 
1 

0 
2 
-1 



4. Find the coordinate vector for ^ relative to the basis S — [Ai, A2. A2. A4) for M22- 



A = 



^3=[i 0} ^=[0 1] 



5. Consider the coordinate vectors 





6" 




"3" 




-1 


. [q]5= 


0 




4 




4 



(a) Find w if 5* is the basis in Exercise 2(a). 

(b) Find q if 5* is the basis in Exercise 3(a). 

(c) Find 5 if is the basis in Exercise 4. 



Answer: 



(a) ir= (16, 10, 12) 

(b) q = 3 + 4x2 



6. Consider the bases B = {ui, U2} and 5' — » ^2 } for Z?^? where 



0 




1 





(a) Find the transition matrix from 5' to B. 

(b) Find the transition matrix from 5 to 5'. 

(c) Compute the coordinate vector [w] £, where 



^5 



and use 10 to compute [w] 
(d) Check your work by computing [w] £' directly. 

7. Repeat the directions of Exercise 6 with the same vector w but with 



-=H- -[-tj- H 



Answer 

(a) 



11 
10 

2 
5 



-4 0 



(b) 



(c) 



0 ■ 
-2 - 

[w]b = 



5 
"2 

11 
2 



il 
10 

8 
5 



[w]b' = 



-4 
-7 



8. Consider the bases B = {ui, U2, U3} and 5' — ^nj , i^, U3 1 for where 





■-3" 




■-3' 




r 


ui = 


0 




2 




6 








-1 




-1 




■-6' 




"-2" 




■-2' 








-6 




-3 




0 




4 




7 



(a) Find the transition matrix from B to B' . 

(b) Compute the coordinate vector [w] where 



^5 

8 



and use 12 to compute [w] £*. 
(c) Check your work by computing [w] directly. 

9. Repeat the directions of Exercise 8 with the same vector w, but with 





'2 




2' 




T 




1 




-1 


. 13 = 


2 




1 




1 




1 





3" 




r 








1 

-5 




1 

-3 




0 
2 



Answer: 

(a) 



3 2 I 
-2 -3 -i 



(b) 



[w]b = 



9 
-9 
5 



. [w]b' = 



_7 

2 
23 

2 

6 



10. Consider the bases 5 = {pi, P2} and B' = ^qi, q2^ for Pi where 

Pl=6 + 3;r, P2 = 10 + 2x, qi=2, q2 = 3 + 2x 

(a) Find the transition matrix from B' to B. 

(b) Find the transition matrix from BXo B' ■ 

(c) Compute the coordinate vector [p ] ^, where p = — 4 + x, and use 12 to compute [p] 

(d) Check your work by computing [p] directly. 

11. Let Fbe the space spanned by f i = sin x and f 2 = cos x. 

(a) Show that = 2sin x + cos x and g2 = 3cos x form a basis for V. 

(b) Find the transition matrix from B' = ^gi, 82} to 5= {f 1, f 2} • 

(c) Find the transition matrix from 5 to 

(d) Compute the coordinate vector [h] g, where h = 2sin x — 5cos x, and use 12 to obtain [h] g» 

(e) Check your work by computing [h] directly. 

Answer: 

(b) 

■ 1 3 



(c) 



(d) 



— n 

2 



i 
"6 



12. Let 5' be the standard basis for R^, and let B = {vi, V2} be the basis in which vi = (2, 1) and 
V2 = (-3.4) 

(a) Find the transition matrix Pb-^S inspection. 

(b) Use Formula 14 to find the transition matrix Pg-^B 

(c) Confirm that Pb-^S Pg^B inverses of one another. 

(d) Let wr = (5, — 3) Find [w] £ and then use Formula 1 1 to compute [w] ^ 

(e) Let Hr= (3, — 5) Find [w] 5 and then use Formula 12 to compute [w] £ 

13. Let S be the standard basis for .w;-', and let B = {vi, V2, V3} be the basis in which vi = (1, 2, 1), 
V2= (2, 5, Ci),andv3= (3, 3, 8). 

(a) Find the transition matrix ,^5- by inspection. 

(b) Use Formula 14 to find the transition matrix Pj^ ,5. 

(c) Confirm that P and P are inverses of one another. 

(d) Let w= (5, — 3, 1). Find [w] £ and then use Formula 11 to compute [w] ^. 

(e) Let w = (3, — 5, 0) . Find [w] ^ and then use Formula 12 to compute [w] g. 

Answer: 



(a) 



(b) 



(d) 



(e) 



1 2 3 

2 5 3 
1 0 8 

-40 16 9 
13 -5 -3 
5 -2 -1 

-239 

77 
30 

3 



-5 

0 



[w]5= 



5 
-3 
1 

-200 
64 
25 



14. Let5i= I'll. 12} and52= {vi,V2) be the bases for in which 
ai = (2.2),U2 = (4. -l).vi = (1.3). andv2 = (- 1, - 1). 

(a) Use Formula 14 to find the transition matrix Pb2'-^B\' 

(b) Use Formula 14 to find the transition matrix Pb\^B2' 

(c) Confirm that ^^2— >Bi ^Bi— »B2 inverses of one another. 



(d) Let Hr= (5, — 3). Find [w] Bi and then use the matrix Pbi-¥B2 compute [w] B2 from [w] By 

(e) Let Hr= (3, — 5). Find [w] B2 and then use the matrix Pb2-^B\ to compute [w] Bi from [w] 

15. Let5i= {ui,U2 } and52= {vi,V2} be the bases for /J^ which ui = (1, 2), U2 = (2, 3), 

VI = (1, 3),andv2^ (1,4). 

(a) Use Formula 14 to find the transition matrix Pb2-^Bi • 

(b) Use Formula 14 to find the transition matrix Pq^ ^£2- 

(c) Confirm that Pb^ and Pb^ ^B2 inverses of one another. 

(d) Let w =(0,1). Find [w] b^ and then use the matrix Pb^ to compute [w] Bo from [w] 5 ^ . 

(e) Let w = (2, 5) . Find [w] £2 and then use the matrix P £2-^81 to compute [w] £^ from [w] 



Answer: 




16. Let5i= {ui,U2, U3} and52= (vi,V2, V3} be the bases for f;-' in which ui = ( — 3, 0, —3), 
U2 = ( - 3. 2, - 1), U3 = (1, 6, - 1), VI = ( - 6, - 6, 0), V2 = ( - 2, - 6, 4), and 

V3 = (-2. -3,7). 

(a) Find the transition matrix -^£2- 

(b) Let Hr= ( — 5, 8, — 5) . Find [w] £^ and then use the transition matrix obtained in part (a) to 
compute [w] B2 by niatrix multiplication. 

(c) Check the result in part (b) by computing [w] B2 directly. 

17. Follow the directions of Exercise 16 with the same vector w but with ui = (2, 1, 1), U2 = (2, —1,1), 
U3 = (1. 2. 1), VI = (3. 1. - 5). V2 = (1. 1. - 3), and V3 = ( - 1. 0, 2). 

Answer: 

5" 
2 
1 
2 
6 





(b) 



9 
-9 
-5 



2 
23 

2 

6 



18. Let S'= {ei, 62) be the standard basis for p}, and let 5 = {vj, V2} be the basis that results when the 
vectors in S are reflected about the line y = x. 

(a) Find the transition matrix 

(b) Let P = pQ ^5 and show that = Ps_^£- 

19. Let = (61,62) be the standard basis for /J^, and let 5 = ( vi , V2 ) be the basis that results when the 
vectors in S are reflected about the line that makes an angle ff with the positive x-axis. 

(a) Find the transition matrix P£ .^9. 

(b) Let P = Pb-^S show that p^ =: Ps-^g- 

Answer: 



(a) rcos2fl sin2f 1 
[ sin 20 -cos 20 J 

20. If Bi, B2, and £3 are bases for H^, and if 

PBi-*B2 = ^ 

then P£2_^B I = 



21. If P is the transition matrix from a basis 5' to a basis 5, and Q is the transition matrix from 5 to a basis C, 
what is the transition matrix from 5' to C? What is the transition matrix from C to 5'? 

22. To write the coordinate vector for a vector, it is necessary to specify an order for the vectors in the basis. If 
P is the transition matrix from a basis 5' to a basis B, what is the effect on P if we reverse the order of 

vectors in B from vi, to v^j vi? What is the effect on P if we reverse the order of vectors in 

both 5' and 5? 



23. Consider the matrix 



P = 



1 1 0 
1 0 2 
0 2 1 



(a) P is the transition matrix from what basis B to the standard basis S= {ei, 62, 63) for /J^? 

(b) P is the transition matrix from the standard basis S= {ei, 62, 63} to what basis B for g^? 



Answer: 



(a) 5= {(1,1,0), (1.0, 2). (0,2.1)) 



24. 



The matrix 



P = 



1 
0 
0 



0 0 
3 2 

1 1 



is the transition matrix from what basis B to the basis ((1, 1, 1), (1, 1,0), (1,0,0)} for p^l 

25. Let 5 be a basis for R^. Prove that the vectors vi , V2, . . Vj^ form a linearly independent set in if and 
only if the vectors [vj ] 5, [V2] 5, [vj^] 5 form a linearly independent set in 

26. Let 5 be a basis for R^. Prove that the vectors y\, V2, span R^ if and only if the vectors 
[viIb, [v2]b,-.., [vjt]^span/e". 

27. If [w] 5 = w holds for all vectors iv in what can you say about the basis Bl 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) If B\ and B2 are bases for a vector space V, then there exists a transition matrix from B\ to 52- 
Answer: 

True 

(b) Transition matrices are invertible. 
Answer: 

True 

(c) If 5 is a basis for a vector space R^, then P£^£ is the identity matrix. 
Answer: 

True 

(d) If Pbi^B2 is ^ diagonal matrix, then each vector in B2 is a scalar multiple of some vector mB\. 



Answer: 



True 



If each vector in B2 is a scalar multiple of some vector in 5i , then Pbi^B2 i^ ^ diagonal matrix. 



Answer: 



False 



If ^ is a square matrix, then A = P^^ _^£^^ for some bases Bi and 52 for 



Answer: 



False 
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4.7 Row Space, Column Space, and Null Space 



In this section we will study some important vector spaces that are associated with matrices. Our work here will provide 
us with a deeper understanding of the relationships between the solutions of a linear system and properties of its 
coefficient matrix. 



Row Space, Column Space, and Null Space 

Recall that vectors can be written in comma-delimited form or in matrix form as either row vectors or column vectors. 
In this section we will use the latter two. 

r 

DEFINITION 1 

For an X « matrix 



the vectors 



A = 



«3ml Ctm2 



r2 



[an <312 ■-- 

[a2i a22 ■- a2n] 



in that are formed from the rows of A are called the row vectors of A, and the vectors 





'an ~ 




'^12' 




ain 




<221 










ci = 




. C2 = 














l2 ni? 







in formed from the columns of A are called the column vectors of A. 



EXAMPLE 1 Row and Column Vectors of a 2 X 3 Matrix -4 

Let 



A = 



2 1 0 
3-14 



The row vectors of A are 



and the column vectors of A are 



ri = [2 1 0]andr2=[3 -1 4] 





'2' 




r 


ci = 


3 


. C2 = 


-1 



and 



The following definition defines three important vector spaces associated with a matrix. 



DEFINITION 2 



If ^ is an ^ X ?2 matrix, then the subspace of i?" spanned by the row vectors of A is called the row space of A, 
and the subspace ofR^ spanned by the column vectors of A is called the column space of A. The solution space 
of the homogeneous system of equations Ax.= 0^ which is a subspace of is called the null space of A. 



In this section and the next we will be concerned with two general questions: 

Question 1. What relationships exist among the solutions of a linear system ^ = b and the row space, column space, 
and null space of the coefficient matrix A7 

Question 2. What relationships exist among the row space, column space, and null space of a matrix? 



Starting with the first question, suppose that 





'an 


- 








A = 




^22 - 


- ^2n 


and x = 








<^m2 ' 









It follows from Formula 10 of Section 1.3 that if c i , C2, - . c„ denote the column vectors of A, then the product Jix can 
be expressed as a linear combination of these vectors with coefficients from x; that is. 



Thus, a linear system, Ax = b? of ^ equations in n unknowns can be written as 



(1) 



(2) 



from which we conclude that ^ = b is consistent if and only if b is expressible as a linear combination of the column 
vectors of^. This yields the following theorem. 



THEOREM 4.7.1 

A system of linear equations Ax = b is consistent if and only if b is in the column space of A. 



EXAMPLE 2 A Vector b in the Column Space of >A M 



Let Ax = b be the linear system 



'-1 3 


2 






r 


1 2 


-3 






-9 


2 1 


-2 


^3 




-3 



Show that b is in the column space of A by expressing it as a Hnear combination of the column vectors of 
A. 

Solution Solving the system by Gaussian elimination yields (verify) 

XI = 2, 7:2 = — 1, X2 = 3 
It follows from this and Formula 2 that 



-1 




3 




id 




1 


1 




2 


+ 3 


-3 




-9 


2 




1 




-2 




-3 



Recall from Theorem 3.4.4 that the general solution of a consistent linear system Ax = h can be obtained by adding any 
specific solution of this system to the general solution of the corresponding homogeneous system ^ = 0- Keeping in 
mind that the null space of A is the same as the solution space of ^ = 0? we can rephrase that theorem in the following 
vector form. 



THEOREM 4.7.2 

If XQ is any solution of a consistent linear system Ax = h^ ifS= ( vi , V2, . . ) is a basis for the null 
space of A, then every solution of ^ = b can be expressed in the form 

x = xo + civi ^C2'V2+...^Cf^yf^ (3) 

Conversely, for all choices of scalars ci, C2, cj^^^the vector x in this formula is a solution of ^ = b- 



Equation 3 gives a formula for the general solution ofj^ — b- The vector xg in that formula is called a particular 
solution of J]x — h^ the remaining part of the formula is called the general solution of J\x,= (}-^^ words, this 
formula tells us that. 

The general solution of a consistent linear system can be expressed as the sum of a particular solution of that system 
and the general solution of the corresponding homogeneous system. 



Geometrically, the solution set of ^ = b can be viewed as the translation by xq of the solution space of ^ = 0 (Figure 
4.7.1). 




Solution space 
of/^x = 0 



Figure 4.7.1 



EXAMPLE 3 General Solution of a Linear System >Ax = b M 



In the concluding subsection of Section 3.4 we compared solutions of the linear systems 





































1 


3 




0 


2 


o" 






0 




"l 


3 


-2 


0 


2 


o" 


^2 




0 


2 


6 


-5 


-2 


4 


-3 


^3 




0 


and 


2 


6 






4 


-3 


Jf3 




-1 






































0 


0 


5 


10 


0 


15 


X4 
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0 


0 


5 


10 


0 


15 


^4 




5 


2 


6 


0 


8 


4 


18 


^5 




0 




2 


6 


0 


8 


4 


18 


^5 




6 














X6 




















^6 







and deduced that the general solution x of the nonhomogeneous system and the general solution of the 
corresponding homogeneous system (when written in column- vector form) are related by 























^6 





■3r -4s-: 
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-2s 

s 

t 

i 

3 



—3 




-4 




-2 


1 




0 




0 


0 




-2 




0 


0 


+ s 


1 




0 


0 




0 




1 


0 




0 




0 



3^0 



Recall from the Remark following Example 4 of Section 4.5 that the vectors in "x.]^ form a basis for the solution space of 



Bases for Row Spaces, Column Spaces, and Null Spaces 



We first developed elementary row operations for the purpose of solving linear systems, and we know from that work 
that performing an elementary row operation on an augmented matrix does not change the solution set of the 
corresponding linear system. It follows that applying an elementary row operation to a matrix A does not change the 
solution set of the corresponding linear system ^ = 0? or, stated another way, it does not change the null space of ^. 
Thus we have the following theorem. 



n 



THEOREM 4.7.3 

Elementary row operations do not change the null space of a matrix. 



The following theorem, whose proof is left as an exercise, is a companion to Theorem 4.7.3. 



THEOREM 4.7.4 

Elementary row operations do not change the row space of a matrix. 



Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary row operations do not change the 
column space of a matrix. To see why this is not true, compare the matrices 



1 3" 




'1 3" 


and 5 = 


2 6_ 


0 0_ 



The matrix B can be obtained from A by adding -2 times the first row to the second. However, this operation has 
changed the column space of ^, since that column space consists of all scalar multiples of 

whereas the column space of B consists of all scalar multiples of 

and the two are different spaces. 



EXAMPLE 4 Finding a Basis for the Null Space of a Matrix M 

Find a basis for the null space of the matrix 



A = 



1 


3 


-2 


0 


2 


0 


2 


6 


-5 


-2 


4 


-3 


0 


0 


5 


10 


0 


15 


2 


6 


0 


8 


4 


18 



Solution The null space of A is the solution space of the homogeneous linear system Ax = 0-> which, as 
shown in Example 3, has the basis 



VI = 



-3 




-4 




-2 


1 




0 




0 












0 


. V2 = 


1 


. V3 = 


0 


0 




0 




1 


0 




0 




0 



Remark Observe that the basis vectors , V2, and V3 in the last example are the vectors that result by successively 
setting one of the parameters in the general solution equal to 1 and the others equal to 0. 



The following theorem makes it possible to find bases for the row and column spaces of a matrix in row echelon form 
by inspection. 



THEOREM 4.7.5 

If a matrix R is in row echelon form, then the row vectors with the leading 1 's (the nonzero row vectors) form a 
basis for the row space of R, and the column vectors with the leading Ts of the row vectors form a basis for the 
column space of R. 



The proof involves little more than an analysis of the positions of the O's and Ts of R. We omit the details. 



EXAMPLE 5 Bases for Row and Column Spaces 

The matrix 



1-2503 

0 13 0 0 

0 0 0 1 0 

0 0 0 0 0 



is in row echelon form. From Theorem 4.7.5, the vectors 

n = [1 -2 5 0 3] 
r2 =[0 1 3 0 0] 
r3 =[0 0 0 1 0] 

form a basis for the row space of R, and the vectors 



ci = 



"1" 




'-2 




"0" 


0 




1 




0 


0 




0 




1 


0 




0 




0 



form a basis for the column space of R. 



EXAMPLE 6 Basis for a Row Space by Row Reduction M 



Find a basis for the row space of the matrix 



A = 



1 


-3 


4 


-2 


5 


4 


2 


-6 


9 


-1 


8 


2 


2 


-6 


9 


-1 


9 


7 


-1 


3 


-4 


2 




-4 



Solution Since elementary row operations do not change the row space of a matrix, we can find a basis 
for the row space of A by finding a basis for the row space of any row echelon form of ^. Reducing A to 
row echelon form, we obtain (verify) 

"l _3 4 _2 5 4" 
„^ 0 0 1 3-2-6 
0 0 0 0 1 5 
0 0 0 0 0 0 

By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and hence form a 
basis for the row space of ^. These basis vectors are 

n = [1 ^3 4 -2 5 4] 

r2 = [0 0 1 3-2 -6] 

r3 = [0 0 0 0 1 5] 



The problem of finding a basis for the column space of a matrix A in Example 6 is complicated by the fact that an 
elementary row operation can alter its column space. However, the good news is that elementary row operations do not 
alter dependence relationships among the column vectors. To make this more precise, suppose that w\ , W2, . . w;^ are 
linearly dependent column vectors of ^, so there are scalars cj, C2,-..,ci^ that are not all zero and such that 

ciwi -f i:2W2+... + c^tWjt = 0 (4) 

If we perform an elementary row operation on A, then these vectors will be changed into new column vectors 

, W2 , . - . At first glance it would seem possible that the transformed vectors might be linearly independent. 
However, this is not so, since it can be proved that these new column vectors will be linear dependent and, in fact, 
related by an equation 

that has exactly the same coefficients as 4. It follows from the fact that elementary row operations are reversible that 
they also preserve linear independence among column vectors (why?). The following theorem summarizes all of these 
results. 



THEOREM 4.7.6 

If A and B are row equivalent matrices, then: 

(a) A given set of column vectors of A is linearly independent if and only if the corresponding column vectors 
of B are linearly independent. 



(b) A given set of column vectors of A forms a basis for the column space of A if and only if the corresponding 
column vectors of B form a basis for the column space of B. 



EXAMPLE 7 Basis for a Column Space by Row Reduction M 

Find a basis for the column space of the matrix 

" 1 «3 4 _2 5 4" 
2-69-182 
2-69-197 

_1 3 _4 2 -5 -4 

Solution We observed in Example 6 that the matrix 

1 «3 4 _2 5 4" 

„^ 0 0 1 3-2-6 

0 0 0 0 1 5 

0 0 0 0 0 0 

is a row echelon form of ^. Keeping in mind that A and R can have different column spaces, we cannot 
fmd a basis for the column space of A directly from the column vectors of R. However, it follows from 
Theorem 4.1.6b that if we can fmd a set of column vectors of R that forms a basis for the column space of 
R, then the corresponding column vectors of A will form a basis for the column space of ^. 

Since the first, third, and fifth columns of R contain the leading Ts of the row vectors, the vectors 







A 




5" 


0 




1 




-2 


0 




0 
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0 




0 




0 



form a basis for the column space of R. Thus, the corresponding column vectors of A, which are 



ci = 



r 




4" 




5" 


2 




9 




8 


2 




9 


. C5 = 


9 


-1 




-4 




-5 



form a basis for the column space of A. 

Up to now we have focused on methods for finding bases associated with matrices. Those methods can readily be 
adapted to the more general problem of finding a basis for the space spanned by a set of vectors in aI". 

EXAMPLE 8 Basis for a Vector Space Using Row Operations A 

Find a basis for the subspace of^^ spanned by the vectors 

VI = (1.-2,0,0,3), V2 = (2,-5,-3,-2,6), 
V3 = (0,5,15.10,0), V4 = (2,6,18,8,6) 



Solution The space spanned by these vectors is the row space of the matrix 

1-2 0 0 3 

2 -5 -3 -2 6 

0 5 15 10 0 

2 6 18 8 6 



Reducing this matrix to row echelon form, we obtain 

1-200 



0 0 0 0 



The nonzero row vectors in this matrix are 

wi = (1, - 2, 0, 0, 3), W2 = (0, 1, 3, 2, 0), W3 = (0, 0, 1, 1, 0) 

These vectors form a basis for the row space and consequently form a basis for the subspace of /J^ 
spanned by , V2, V3, and V4. 



Bases Formed from Row and Column Vectors of a Matrix 

In all of the examples we have considered thus far we have looked for bases in which no restrictions were imposed on 
the individual vectors in the basis. We now want to focus on the problem of finding a basis for the row space of a matrix 
A consisting entirely of row vectors from A and a basis for the column space of A consisting entirely of column vectors 
of^. 

Looking back on our earlier work, we see that the procedure followed in Example 7 did, in fact, produce a basis for the 
column space of A consisting of column vectors of A, whereas the procedure used in Example 6 produced a basis for the 
row space of A, but that basis did not consist of row vectors of ^. The following example shows how to adapt the 
procedure from Example 7 to find a basis for the row space of a matrix that is formed from its row vectors. 

EXAMPLE 9 Basis for the Row Space of a Matrix M 



Find a basis for the row space of 



A = 



-2 0 0 3 

-5 -3 -2 6 

5 15 10 0 

6 18 8 6 



consisting entirely of row vectors from^. 



Solution We will transpose A, thereby converting the row space of A into the column space of j{ ^; then 
we will use the method of Example 7 to find a basis for the column space of ^4 ^; and then we will 
transpose again to convert column vectors back to row vectors. Transposing A yields 
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2 
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-2 


-5 
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-3 


15 


18 
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-2 


10 
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3 
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0 


6 



Reducing this matrix to row echelon form yields 
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-5 


-10 
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0 
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0 


0 



The first, second, and fourth columns contain the leading 1 's, so the corresponding column vectors in ^ 
form a basis for the column space of A ^; these are 
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. C2 = 
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and C4 = 


18 
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6 



Transposing again and adjusting the notation appropriately yields the basis vectors 
ri = [l -2 0 0 3], 1-2= [2 -5 -3 -2 6], 

and 

r4=[2 6 18 8 6] 

for the row space of ^. 



Next, we will give an example that adapts the methods we have developed above to solve the following general 
problem in/?": 

n 

PROBLEM 

Given a set of vectors S = ( vi , V2, . . v;^ ) in fmd a subset of these vectors that forms a basis for span (S), 
and express those vectors that are not in that basis as a linear combination of the basis vectors. 



EXAMPLE 10 Basis and Linear Combinations M 

(a) Find a subset of the vectors 

VI = (1. -2.0,3), V2 = (2, -5, -3,6), 
V3= (0,1,3,0), V4=(2. -1,4, -7), V5 = (5, -8, 1,2) 

that forms a basis for the space spanned by these vectors. 

(b) Express each vector not in the basis as a linear combination of the basis vectors. 



Solution 



(a) 



We begin by constracting a matrix that has vi, V2, 



., vj as its column vectors: 



1 2 0 

-2 -5 1 

0-3 3 

3 6 0 



2 5 

-1 -8 

4 1 

-7 2 



(5) 



T T T T T 



VI V2 V3 V4 V5 



The first part of our problem can be solved by finding a basis for the column space of this matrix. 
Reducing the matrix to reduced row echelon form and denoting the column vectors of the resulting 
matrix by , W2, W3, W4, and yields 



1 0 2 0 1 

0 1-10 1 

0 0 Oil 

0 0 0 0 0 



(6) 



T T T T T 



wi W2 W3 W4 W5 

The leading Ts occur in columns 1, 2, and 4, so by Theorem 4.7.5, 

{wi, W2, W4) 

is a basis for the column space of 6, and consequently, 

{vi, V2, V4) 



is a basis for the column space of 5. 

(b) We will start by expressing W3 and W;5 as linear combinations of the basis vectors wj, W2? W'4. The 
simplest way of doing this is to express W3 and w_5 in terms of basis vectors with smaller subscripts. 
Accordingly, we will express W3 as a linear combination of and W2, and we will express as a 
linear combination of w^, W2, and W4. By inspection of 6, these linear combinations are 

W3 = 2wi — W2 
W5 = W1+W2+W4 

We call these the dependency equations. The corresponding relationships in 5 are 

V3 = 2vi - V2 
V5 = VI + V2 + V4 



The following is a summary of the steps that we followed in our last example to solve the problem posed above. 
Basis for Span(S) 

Step 1. Form the matrix A having vectors in = { vi , V2, . . ., Vj^; ) as column vectors. 
Step 2. Reduce the matrix A to reduced row echelon form R. 
Step 3. Denote the column vectors of Rhyw\, W2, - - w;^. 

Step 4. Identify the columns of R that contain the leading 1 's. The corresponding column vectors of A form a basis for 
span(iS). 

This completes the first part of the problem. 

Step 5. Obtain a set of dependency equations by expressing each column vector of R that does not contain a leading 1 
as a linear combination of preceding column vectors that do contain leading Ts. 



step 6. Replace the column vectors of R that appear in the dependency equations by the corresponding column vectors 
of A. 

This completes the second part of the problem. 



Concept Review 

• Row vectors 

• Column vectors 

• Row space 

• Column space 

• Null space 

• General solution 

• Particular solution 

• Relationships among linear systems and row spaces, column spaces, and null spaces 

• Relationships among the row space, column space, and null space of a matrix 

• Dependency equations 

Skills 

• Determine whether a given vector is in the column space of a matrix; if it is, express it as a linear 
combination of the column vectors of the matrix. 

• Find a basis for the null space of a matrix. 

• Find a basis for the row space of a matrix. 

• Find a basis for the column space of a matrix. 

• Find a basis for the span of a set of vectors in 



Exercise Set 4.7 

1. List the row vectors and column vectors of the matrix 



Answer: 

ri = (2, -1.0, 1), r2 = (3,5.7. -1), r3 = 



2 -1 


0 


1 


3 5 


7 


-1 


1 4 


2 


7 


(1,4, 2, 


7); 







"2" 








'0' 




r 


ci = 


3 




5 


. C3 = 


7 




-1 




1 




4 




2 




7 



2. Express the product ^ as a linear combination of the column vectors of ^. 



(a) 



(b) 



(c) 



2 3 

-1 4 

4 0 -1 

3 6 2 
0 -1 4 

-3 6 2 

5-4 0 

2 3-1 

1 8 3 



-2 

3 



-1 
2 
5 



(d) 



r2 1 5' 

[e 3 -8 



3 
0 

-5 



3. Determine whether b is in the column space of A, and if so, express b as a linear combination of the column vectors 
of A. 



(a)^^ 


'1 3" 




-[ 




^] 






4 -sj 






1 






(b) 


"1 1 2 








-1" 




A = 


1 0 1 




b = 




0 






2 1 3 








2 




(c) 


'1 -1 


r 








5" 


A = 


9 3 


1 


; b = 






1 




1 1 


1 








1 



(d) 



A = 



(e) 



1 -1 1 
1 1 -1 
-1 -1 1 

12 0 1 
0 12 1 
12 13 
0 12 2 



b = 



b = 



Answer: 



(b) b is not in the column space of A. 



1 




-1 




1 




5 


9 


-3 


3 


+ 


1 




1 


1 




1 




1 




-1 



(d) 


"2" 




r 




-1" 




1 




0 




1 


+ (^-1) 


1 


+t 


-1 




0 




-1 




-1 




1 



4" 




"l" 




'2 




'o" 




"r 


3 


= -26 


0 


+ 13 


1 


-7 


2 


1 4 


1 


5 




1 


2 




1 




3 


7 




0 




1 




2 




2 



4. Suppose that xi= — l,X2 = 2,X2 = '^->^4 = — 3 is a solution of a nonhomogeneous linear system i4x = b that 
the solution set of the homogeneous system Ax = 0 is given by the formulas 

(a) Find a vector form of the general solution of >jx = 0- 

(b) Find a vector form of the general solution of ^4x: = b- 

5. In parts (a)-(d), fmd the vector form of the general solution of the given linear system Ax. = hl then use that result to 
fmd the vector form of the general solution of ^ = 0- 

(a) 7:1-37:2 = 1 
2jri-6x2 = 2 

(b) 7:1+7:2 + 27:3 = 5 
^1 + ^3= -2 

27:1 +;r2 + 37:3 = 3 

(c) 7:1—27:2-1- 7:3 -f 27:4=—! 
27:1 — 47:2 -I- 27:3 -h 47:4 = — 2 
— 7:1 + 27:2— 7:3 — 27:4= 1 
37:1 — 67:2 + 37:3 + 67:4 = — 3 

(d) 7:1 + 27:2-37:3+ 7:4= 4 
-27:1+ 7:2 + 27:3+ 7:4=-! 

— 7:1 + 37:2— 7:3 + 27:4= 3 
47:1 — 77:2 — 57:4 = — 5 



Answer: 



1 




3 




3 


1 t 




; t 




0 




1 




1 



-2 






1 




-1 
















7 






1 


, t 


-1 
















0 




1 




1 
















-1 




2 






-1 




-2 




2 




-1 




-2 


0 




1 






0 




0 




1 


+ s 


0 


+t 


0 




1 r 


0 




s 


1 


1 t 


0 


; r 




1 




0 












0 


0 


0 




0 






0 




1 




0 




0 




1 



6 




7 




1 








1 


5 




5 




5 




5 




5 


7 




4 




3 




4 




3 


5 




5 




5 




5 




5 


0 




1 




0 




1 




0 


0 




0 




1 




0 




1 



6. Find a basis for the null space of A, 



(a) 




'1 


-1 


3 




A = 


5 


-4 


-4 






7 


-6 


2 


(b) 




'2 


0 - 


■1" 




A = 


4 


0 - 


-2 






0 


0 


0 



(c) 




1 


4 5 


2" 








A = 


2 


1 3 


0 










-1 


3 2 


2 






(d) 




1 


4 


5 


6 


9 




A = 


3 


—2 


1 


4 


— 1 






-1 


0 


-1 


-2 


-1 






2 


3 


5 


7 


8 


(e) 




1 


-3 


2 




1 






0 


3 


6 


0 


-3 






2 


-3 


-2 


4 


4 






3 


-6 


0 


6 


5 






-2 


9 


2 


-4 


-5 



7. In each part, a matrix in row echelon form is given. By inspection, find bases for the row and column spaces of ^. 



(a) 



1 0 2 

0 0 1 

0 0 0 



(b) 


1 


-3 


0 


0 




0 


1 


0 


0 




0 


0 


0 


0 




0 


0 


0 


0 


(c) 


1 


2 


4 


5 




0 


1 - 


-3 


0 




0 


0 


1 


-3 




0 


0 


0 


1 




0 


0 


0 


0 


(d) 


1 


2 - 


-1 


5 




0 


1 


4 


3 




0 


0 


1 


-7 




0 


0 


0 


1 



Answer: 



(a) 


T 




"2" 


ri = [10 2], r2=[0 0 1], cj = 


0 


, C2 = 


1 




0 




0 



(b) 



ri = [l -300], r2=[0100]. ci = 



(c) ri = [1 2 4 5]. r2=[0 1 -3 0], r3=[0 0 1 -3], r4=[0 0 0 1], 



"r 






0 




1 


0 




0 


0 




0 



ci = 



1 




2 




4 




5' 


0 




1 




-3 




0 


0 


, C2 = 


0 


. C3 = 


1 




-3 


0 




0 




0 




1 


0 




0 




0 




0 



(d) ri = [12 -15], r2=[0143]. r3=[001 -7], r4=[000 1] 





1 




2 




-1 




5 




0 




1 




4 




3 




0 




0 




1 




-7 




0 




0 




0 




1 



8. For the matrices in Exercise 6, find a basis for the row space of A by reducing the matrix to row echelon form. 

9. By inspection, find a basis for the row space and a basis for the column space of each matrix. 



(a) 



(b) 



(c) 



(d) 



1 0 2 

0 0 1 

0 0 0 

1 -3 0 
0 1 0 
0 0 0 
0 0 0 



1 


2 


4 


5 


0 


1 


-3 


0 


0 


0 


1 


-3 


0 


0 


0 


1 


0 


0 


0 


0 


1 


2 


-1 


5 


0 


1 


4 


3 


0 


0 


1 


-7 


0 


0 


0 


1 



Answer: 



(a) 






"2" 


ri = [l 0 2]; r2=[0 0 1]; ei = 


0 


; C2 = 


1 




0 




0 



(b) 



ri = [l -3 0 0]; r2=[0 1 0 0]; ci = 



(c)ri = [l 2 4 5];r2=[0 1 -3 0];r3=[0 0 1 -3]; 



r4= [0 0 0 1]; ci = 



r 




-3' 


0 




1 


0 


; C2 = 


0 


0 




0 


[0 


0 1 • 


-3]; 



1 




2 




4 




5 


0 




1 




-3 




0 


0 




0 


- C3 = 


1 


. C4 = 


-3 


0 




0 




0 




1 


0 




0 




0 




0 



(d)ri = [l 2 -1 5];r2=[0 1 4 3];r3=[0 0 1 -7]; 



r4 = [0 0 0 1], ci = 





r 




2 




-1 




5 




0 




1 




4 




3 


ci = 


0 


; C2 = 


0 




1 




-7 




0 




0 




0 




1 



10. For the matrices in Exercise 6, find a basis for the row space of A consisting entirely of row vectors of ^. 

11. Find a basis for the subspace of spanned by the given vectors. 

(a) (1, 1, -4, -3), (2. 0,2, -2), (2, -1,3,2) 

(b) (-1,1, -2,0), (3. 3, 6,0), (9, 0,0. 3) 

(c) (1,1,0,0), (0,0,1,1), (-2.0.2,2), (0,-3.0.3) 

Answer: 



(a) (1,1, -4-3), (0,1, -5, -2), (o, 0, 1, - i j 

(b) (1, -1,2,0), (0,1,0,0), |o, 0, 1, --ij 

(c) (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 



12. Find a subset of the vectors that forms a basis for the space spanned by the vectors; then express each vector that is 
not in the basis as a Hnear combination of the basis vectors. 

(a) ¥1 = (1,0, 1,1), V2 = (-3,3,7, 1), V3=(-l,3,9,3), V4=(-5,3,5, -1) 

(b) VI = (1, -2,0,3), V2 = (2. -4,0,6), V3 = ( - 1, 1, 2, 0), V4=(0, -1,2,3) 

(c) VI = (1, -1,5,2), V2 = (-2,3, 1,0), V3 = (4, -5.9.4), V4=(0,4,2, - 3), V5 = ( -7, 18, 2, -8) 

13. Prove that the row vectors of an « x « invertible matrix A form a basis for R^. 

14. Construct a matrix whose null space consists of all linear combinations of the vectors 



VI 



r 




2" 


-1 

3 


and V2 = 


0 
-2 


2 




4 



15. 



(a) Let 



A = 



0 1 0 

1 0 0 

0 0 0 



Show that relative to an A;vz-coordinate system in 3-space the null space of A consists of all points on the z-axis 
and that the column space consists of all points in the xy-plane (see the accompanying figure). 

(b) Find a 3 x 3 matrix whose null space is the x-axis and whose column space is the jz-plane. 

Nun space of A 



Colmiui space 
ofA 



Figure Ex-15 



Answer: 



(b) 



0 0 0 
0 1 0 
0 0 1 



16. Find a 3 x 3 matrix whose null space is 

(a) a point. 

(b) a line. 

(c) a plane. 

^"^^ (a) Find all 2 x 2 matrices whose null space is the line 3x — 5y = 0 . 
(b) Sketch the null spaces of the following matrices: 

1 4 



A = 



0 5 



■=[3 >]■ 



B = 



D = 



1 0 

0 5 

"0 0 
0 0 



Answer: 



(a) 



3a —5a 
3b -5b 



for all real numbers a, b not both 0. 



(b) Since A and B are invertible, their null spaces are the origin. The null space of C is the line 3x = 0. The null 
space of D is the entire xy-plane. 

18. The equation + 7:2 + = 1 can be viewed as a linear system of one equation in three unknowns. Express its 
general solution as a particular solution plus the general solution of the corresponding homogeneous system. 
[Suggestion: Write the vectors in column form.] 

19. Suppose that A and B arQ^xn matrices and A is invertible. Invent and prove a theorem that describes how the row 
spaces of AB and B are related. 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) The span of vi , . . v„ is the column space of the matrix whose column vectors are , . . ., v„. 
Answer: 

True 

(b) The column space of a matrix A is the set of solutions of ^ = b- 
Answer: 

False 

(c) If R is the reduced row echelon form of A, then those column vectors of R that contain the leading I's form a basis for 
the column space of A. 



Answer: 

False 

(d) The set of nonzero row vectors of a matrix ^ is a basis for the row space of ^. 
Answer: 

False 

(e) If A and ^ are » x « rnatrices that have the same row space, then A and B have the same column space. 
Answer: 

False 

(f) If is an X w elementary matrix and ^ is an ^ x « rnatrix, then the null space of ^ is the same as the null space 
of^. 

Answer: 

True 

(g) IfEisan^xm elementary matrix and ^ is an ^ x « rnatrix, then the row space of ^ is the same as the row space 
of^. 

Answer: 

True 

(h) If is an ^ X elementary matrix and ^ is an ^ x « matrix, then the column space of ^ is the same as the column 
space of^. 

Answer: 

False 

(i) The system i4x = b is inconsistent if and only if is not in the column space of A, 
Answer: 

True 

(j) There is an invertible matrix A and a singular matrix B such that the row spaces of A and B are the same. 
Answer: 
False 
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4.8 Rank, Nullity, and the Fundamental Matrix 
Spaces 

In the last section we investigated relationships between a system of linear equations and the row space, column 
space, and null space of its coefficient matrix. In this section we will be concerned with the dimensions of those 
spaces. The results weobtain will provide a deeper insight into the relationship between a linear system and its 
coefficient matrix. 



Row and Column Spaces Have Equal Dimensions 

In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the matrix 



A = 



1 


-3 


4 


-2 


5 


4 


2 


-6 


9 


-1 


8 


2 


2 


-6 


9 


-1 


9 


7 


-1 


3 


-4 


2 


—5 


-4 



both have three basis vectors and hence are both three-dimensional. The fact that these spaces have the same 
dimension is not accidental, but rather a consequence of the following theorem. 



THEOREM 4.8.1 

The row space and column space of a matrix A have the same dimension. 

n-^ i 

Proof Let R be any row echelon form of A. It follows from Theorem 4.7.4 and Theorem 4.7.6 b that 

dim (row space of -^4) = dim (row space of ^) 
dim(column space of -4) = dim(column space of R) 

so it suffices to show that the row and column spaces of R have the same dimension. But the dimension of the row 
space of R is the number of nonzero rows, and by Theorem 4.7.5 the dimension of the column space of R is the 
number of leading I's. Since these two numbers are the same, the row and column space have the same dimension. 



Ranl< and Nullity 

The dimensions of the row space, column space, and null space of a matrix are such important numbers that there is 
some notation and terminology associated with them. 



r 



DEFINITION 1 



The common dimension of the row space and column space of a matrix A is called the rank of A and is 
denoted by rank(^); the dimension of the null space of A is called the nullity of A and is denoted by 
nullity(^). 

L J 



The proof of Theorem 4.8.1 shows that the rank 
of A can be interpreted as the number of leading 
1 's in any row echelon form of A. 



EXAMPLE 1 Rank and Nullity of a4 X 6 Matrix 

Find the rank and nullity of the matrix 







-1 2 0 


4 


5 


-3 


A = 


3-7 2 
2-5 2 


0 
4 


1 
6 


4 
1 






4-9 2 


-4 


-4 


7 


The reduced row echelon form of A is 










"l 


0 -4 -28 


-37 


13' 






0 


1 -2 -12 


-16 


5 






0 


0 0 0 


0 


0 






0 


0 0 0 


0 


0 





(1) 



(verify). Since this matrix has two leading I's, its row and column spaces are two-dimensional and 
rank (A) = 2. To find the nullity of ^, we must find the dimension of the solution space of the linear 
system ^ix: = 0- This system can be solved by reducing its augmented matrix to reduced row echelon 
form. The resulting matrix will be identical to 1, except that it will have an additional last column of 
zeros, and hence the corresponding system of equations will be 

xi — 47:3 — 28x4 — 31x^ + 13x6 = 0 
X2 — 27:3 — 12x4 ~ 16x5 + 5X(5 = 0 
Solving these equations for the leading variables yields 



^1 

^2 



= 4X3 
= 2x3 



+ 
+ 



28x4 
12X4 



+ 
+ 



37x5 - 
16x5 - 



13x6 
5x6 



(2) 



from which we obtain the general solution 

XI = 

X2 = 

X3 = 

X4 = 

X5 = 

X6 = 



4r + 2Bs + 31t-\3u 

2r+ \2s+\6t-5u 

r 

s 

t 
u 



or in column vector form 







A 




28 




37 




-13 


'2 




2. 




\2. 




Id 




— J 


^3 




1 


+ s 


0 




0 




0 


= r 


1 








X4 




0 


0 




0 






0 




0 




1 




0 


X6 




0 




0 




0 




1 



(3) 



Because the four vectors on the right side of 3 form a basis for the solution space, nullity(^) = 4. 



EXAMPLE 2 Maximum Value for Rank < 

What is the maximum possible rank of an ^ x ?2 matrix A that is not square? 

Solution Since the row vectors of A lie inR^ and the column vectors in the row space of A is 
at most ^-dimensional and the column space is at most m-dimensional. Since the rank of A is the 
common dimension of its row and column space, it follows that the rank is at most the smaller of m 
and n. We denote this by writing 

rank(A) < min(w, n) 

in which min {m, n) is the minimum of m and n. 

The following theorem establishes an important relationship between the rank and nullity of a matrix. 

THEOREM 4.8.2 Dimension Theorem for IVIatrices 

If ^ is a matrix with n columns, then 

rank(-4) + n\jMy{A) = n (4) 



Proof Since A has n columns, the homogeneous linear system ^ = 0 has n unknowns (variables). These fall into 
two distinct categories: the leading variables and the free variables. Thus, 

number of leading 
variables 

But the number of leading variables is the same as the number of leading I's in the reduced row echelon form of ^, 
which is the rank of A; and the number of free variables is the same as the number of parameters in the general 
solution of = 0, which is the nullity of A. This yields Formula 4. 



^ number of free _ ^ 
variables 



EXAMPLES The Sum of Rank and Nullity < 



The matrix 



A = 



-1 


2 


0 


4 


5 -3 


3 


-7 


2 


0 


1 4 


2 


-5 


2 


4 


6 1 


4 


-9 


2 


-4 


-4 7 



has 6 columns, so 

rank(-4) + nullity(-4) = 6 

This is consistent with Example 1, where we showed that 

raiik(^) = 2 and nullity (^) =4 



The following theorem, which summarizes results already obtained, interprets rank and nullity in the context of a 
homogeneous linear system. 



THEOREM 4.8.3 

If v4 is an ^ X « matrix, then 

(a) rank(^) = the number of leading variables in the general solution of Ak = 0. 
^hj nullity (^) = the number of parameters in the general solution oiAx = 0 

EXAMPLE 4 Number of Parameters in a General Solution M 

Find the number of parameters in the general solution of ^ = Q if ^ is a 5 x 7 matrix of rank 3. 
Solution From 4, 

nu%(^) = « - rank(^) =7-3 = 4 

Thus there are four parameters. 



Equivalence Theorem 

In Theorem 2.3.8 we listed seven results that are equivalent to the invertibility of a square matrix ^. We are now in 
a position to add eight more results to that list to produce a single theorem that summarizes most of the topics we 
have covered thus far. 



THEOREM 4.8.4 Equivalent Statements 



If ^ is an ^ X « matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) ^ = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

(d) A is expressible as a product of elementary matrices. 
(^) -f4x = b is consistent for every ^2x1 matrix b- 

(f) = b has exactly one solution for every « x 1 matrix b- 

(g) det(^)^0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 
(j) The column vectors of A span 

(k) The row vectors of A span R^. 

(I) The column vectors of A form a basis for 

(m) The row vectors of A form a basis for 

(n) A has rank n. 

(o) A has nullity 0. 



Proof The equivalence of (h) through (m) follows from Theorem 4.5.4 (we omit the details). To complete the 
proof we will show that (6), («), and (o) are equivalent by proving the chain of implications 

(6) ^ (o) If jix. = 0 has only the trivial solution, then there are no parameters in that solution, so nullity (A) = 0 
by Theorem 4.8.3 b. 

Theorem 4.8.2. 

(«) ^ (i) If A has rank n, then Theorem 4.8.3a implies that there are n leading variables (hence no free variables) 
in the general solution of Jix = 0- This leaves the trivial solution as the only possibility. 



Overdetermined and Underdetermined Systems 

In many applications the equations in a linear system correspond to physical constraints or conditions that must be 
satisfied. In general, the most desirable systems are those that have the same number of constraints as unknowns, 
since such systems often have a unique solution. Unfortunately, it is not always possible to match the number of 
constraints and unknowns, so researchers are often faced with linear systems that have more constraints than 
unknowns, called overdetermined systems, or with fewer constraints than unknowns, called underdetermined 
systems. The following two theorems will help us to analyze both overdetermined and underdetermined systems. 



In engineering and other applications, the 
occurrence of an overdetermined or 
underdetermined hnear system often signals that 
one or more variables were omitted in formulating 
the problem or that extraneous variables were 
included. This often leads to some kind of 
undesirable physical result. 

□ IS 

THEOREM 4.8.5 

If Ax = b is a consistent linear system of m equations in n unknowns, and if A has rank r, then the general 
solution of the system contains « — parameters. 

Proof It follows from Theorem 4.7.2 that the number of parameters is equal to the nullity of A, which, by 
Theorem 4.8.2, is ^ — ^. 

THEOREM 4.8.6 

Let ^ be an ^ X « matrix. 

(a) (Overdetermined Case) If ^ ^, then the linear system Ax = b is inconsistent for at least one vector 
bin/?". 

(b) (Underdetermined Case) If ^ ^, then for each vector y^'mR^ the linear system ^ = b is either 
inconsistent or has infinitely many solutions. 

Proof (a) Assume that ^ > in which case the column vectors of A cannot span (fewer vectors than the 
dimension of R^). Thus, there is at least one vector b in that is not in the column space of A, and for that b the 
system = b is inconsistent by Theorem 4.7. 1 . 

Proof (b) Assume that ^ ^. For each vector b in R^ there are two possibilities: either the system Ax = b is 
consistent or it is inconsistent. If it is inconsistent, then the proof is complete. If it is consistent, then Theorem 4.8.5 
implies that the general solution has ^ — r parameters, where r = rank(A') . But rank (A) is the smaller of m and n, 
so 

?7 — r = ?2 — m>0 

This means that the general solution has at least one parameter and hence there are infinitely many solutions. 



EXAMPLE 5 Overdetermined and Underdetermined Systems M 



(a) What can you say about the solutions of an overdetermined system ^ = b of 7 equations in 5 
unknowns in which A has rank ,r = 4? 

(b) What can you say about the solutions of an underdetermined system Ax = h equations in 7 
unknowns in which A has rank ^ = 4? 

Solution 

(a) The system is consistent for some vector b in and for any such b the number of parameters in 
the general solution is^— ^=5— 4 = 1- 

(b) The system may be consistent or inconsistent, but if it is consistent for the vector b in then the 
general solution has ^_/- = 7— 4 = 3 parameters. 



EXAMPLE 6 An Overdetermined System < 

The linear system 



'1 




2x2 


= h 


^1 




X2 


= h 


XI 


+ 




= b3 


XI 


+ 


2x2 


= b4 




+ 


3X2 


= b5 



is overdetermined, so it cannot be consistent for all possible values ofb\, ^2' ^3' ^4' ^5- Exact 
conditions under which the system is consistent can be obtained by solving the linear system by Gauss- 
Jordan elimination. We leave it for you to show that the augmented matrix is row equivalent to 



1 


0 




2b2 






0 


1 




h 




bi 


0 


0 


h 


- 3b2 


+ 


2b I 


0 


0 


i>4 


- 4b2 


+ 


3bi 


0 


0 


b5 


- 5b2 


+ 


4b I 



Thus, the system is consistent if and only if ij, ^3' ^4' ^5 satisfy the conditions 

2^1-3^2-1-63 =0 

361-462 +64 =0 

461-562 + 65 = 0 
Solving this homogeneous linear system yields 

6i=5r — 4s, 62 = 4r — 3s, 63 = 2^ — 5, 64 = /-, b^ = s 

where r and s are arbitrary. 



Remark The coefficient matrix for the linear system in the last example has ^ = 2 columns, and it has rank r = 2 
because there are two nonzero rows in its reduced row echelon form. This implies that when the system is 
consistent its general solution will contain « — ^ = Q parameters; that is, the solution will be unique. With a 
moment's thought, you should be able to see that this is so from 5. 



The Fundamental Spaces of a Matrix 



There are six important vector spaces associated with a matrix A and its transpose ^ ^: 

T 

row space of A row space of -4 

T 

column space of ^ column space o( A 
null space oiA null space of ^ 

However, transposing a matrix converts row vectors into column vectors and conversely, so except for a difference 
in notation, the row space of ^ ^ is the same as the column space of A, and the column space of ^4 ^ is the same as 
the row space of A. Thus, of the six spaces listed above, only the following four are distinct: 

row space of A column space of A 

T 

null space of ^ null space of A 



If ^ is an ^ X « matrix, then the row space and 
null space of A are subspaces of R^, and the 
column space of A and the null space of ^ ^ are 
subspaces ofR^. 

These are called the fundamental spaces of a matrix A. We will conclude this section by discussing how these four 
subspaces are related. 

Let us focus for a moment on the matrix ^4 Since the row space and column space of a matrix have the same 
dimension, and since transposing a matrix converts its columns to rows and its rows to columns, the following 
result should not be surprising. 

y y 

THEOREM 4.8.7 

If A is any matrix, then rank|^^j = rankf^^j. 



Proof 



rank J = dim (row space of -^4) = dim ^column space of = rank^-4'^J . 



This result has some important implications. For example, if ^ is an ^ x « matrix, then applying Formula 4 to the 
matrix ^4 ^ and using the fact that this matrix has m columns yields 



rank(^^) + nullity (^^) 



= m 



which, by virtue of Theorem 4.8.7, can be rewritten as 



rank 



(^) + nuUity(^^) 



This alternative form of Formula 4 in Theorem 4.8.2 makes it possible to express the dimensions of all four 
fundamental spaces in terms of the size and rank of A. Specifically, if rank(^) = r, then 

dim [row(^) ]=r dim [ col(-4) ] = r 



The four formulas in 7 provide an algebraic relationship between the size of a matrix and the dimensions of its 
fundamental spaces. Our next objective is to find a geometric relationship between the fundamental spaces 
themselves. For this purpose recall from Theorem 3.4.3 that if ^ is an ^ x matrix, then the null space of A 
consists of those vectors that are orthogonal to each of the row vectors of ^. To develop that idea in more detail, 
make the following definition. 



If ^ is a subspace of /J", then the set of all vectors in that are orthogonal to every vector in ^is called 
the orthogonal complement of ^and is denoted by the symbol W^- 

L 

The following theorem lists three basic properties of orthogonal complements. We will omit the formal proof 
because a more general version of this theorem will be given later in the text. 

THEOREM 4.8.8 

If ^ is a subspace of R^, then: 

(a) is a subspace of R^. 

(b) The only vector common to Wand W ^ is 0. 

(c) The orthogonal complement of J^^ ^ is W. 




r 



DEFINITION 2 



EXAMPLE 7 Orthogonal Complements A 



In pj- the orthogonal complement of a line ^through the origin is the line through the origin that is 
perpendicular to ^(Figure 4.8.1a); and in r} the orthogonal complement of a plane ^through the 
origin is the line through the origin that is perpendicular to that plane (Figure 4.8. IZ?). 



w 



\ 



X 



X 



(ft) 



Figure 4.8.1 



Explain why (0) and BP are orthogonal 
complements. 



A Geometric Link Between tine Fundamental Spaces 

The following theorem provides a geometric link between the fundamental spaces of a matrix. Part (a) is essentially 
a restatement of Theorem 3.4.3 in the language of orthogonal complements, and part (b), whose proof is left as an 
exercise, follows from part (a). The essential idea of the theorem is illustrated in Figure 4.8.2. 

y 



THEOREM 4.8.9 

If ^ is an ^ X « matrix, then: 

(a) The null space of A and the row space of A are orthogonal complements in R^. 

(b) The null space of ^4 and the column space of A are orthogonal complements in R^. 

n 




Figure 4.8.2 



More on the Equivalence Theorem 



As our final result in this section, we will add two more statements to Theorem 4.8.4. We leave the proof that those 
statements are equivalent to the rest as an exercise. 



THEOREM 4.8.10 Equivalent Statements 

If ^ is an « X « matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax. = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is /„. 

(d) A is expressible as a product of elementary matrices. 
(^) Ax. = b is consistent for every ^ x 1 matrix b- 

(f) Ax = h has exactly one solution for every n k \ matrix b- 

(g) det(^)itO. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 
(j) The column vectors of A span R^. 

(k) The row vectors of A span R^. 

(I) The column vectors of A form a basis for 

(m) The row vectors of A form a basis for 

(n) ^ has rank «• 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space oiAisR^. 

(q) The orthogonal complement of the row space of ^ is (0) . 



Applications of Rank 

The advent of the Internet has stimulated research on finding efficient methods for transmitting large amounts of 
digital data over communications lines with limited bandwidths. Digital data are commonly stored in matrix form, 
and many techniques for improving transmission speed use the rank of a matrix in some way. Rank plays a role 
because it measures the "redundancy" in a matrix in the sense that if A is an x n matrix of rank k, then ^ _ ^ of 
the column vectors and ^ _ ^ of the row vectors can be expressed in terms of k linearly independent column or 
row vectors. The essential idea in many data compression schemes is to approximate the original data set by a data 
set with smaller rank that conveys nearly the same information, then eliminate redundant vectors in the 
approximating set to speed up the transmission time. 



Concept Review 

• Rank 

• Nullity 

• Dimension Theorem 

• Overdetermined system 

• Underdetermined system 

• Fundamental spaces of a matrix 

• Relationships among the fundamental spaces 

• Orthogonal complement 

• Equivalent characterizations of invertible matrices 
Skills 

• Find the rank and nullity of a matrix. 

• Find the dimension of the row space of a matrix. 



Exercise Set 4.8 

1- Verify that rank (a^ = rank (a ^ j. 



A = 



12 4 0 
-3152 
-2 3 9 2 



Answer: 

Rank(^)=Rank(^^) = 2 

2. Find the rank and nullity of the matrix; then verify that the values obtained satisfy Formula 4 in the Dimension 
Theorem. 



(a) 



A = 



(b) 



(c) 



(d) 



A = 



A = 



A = 



1 
5 
7 

2 
4 
0 

1 
2 
-1 

1 
3 
-1 

2 



-1 3 
-4 -4 
-6 2 

0 -1 
0 -2 
0 0 



4 
1 

3 

4 
-2 



5 
1 

0 -1 

3 5 



6 
4 
-2 

7 



(e) 



A = 



1 
0 
2 
3 
-2 



-3 
3 
-3 
■6 

9 



2 
6 
-2 
0 
2 



2 1 

0 -3 

4 4 

6 5 

_4 _5 



3. In each part of Exercise 2, use the results obtained to find the number of leading variables and the number of 
parameters in the solution of ^ = 0 without solving the system. 



Answer: 



(a) 2; 1 

(b) 1;2 

(c) 2; 2 

(d) 2; 3 

(e) 3; 2 

4. In each part, use the information in the table to find the dimension of the row space of A, column space of A, 
null space of A, and null space of A^. 





(a) 


(b) 


(c) 


(d) 


(e) 


(f) 


(g) 


Size of A 
Rank(^) 


3x3 

3 


3x3 

2 


3x3 

1 


5x9 

2 


9x5 

2 


4x4 

0 


6x2 

2 



5. In each part, find the largest possible value for the rank of A and the smallest possible value for the nullity of ^. 

(a) ^ is 4x4 

(b) ^ is 3 X 5 

(c) ^ is 5 X 3 

Answer: 

(a) Rank = 4, nullity = 0 

(b) Rank = 3, nii]£ly = 2 

(c) Rank = 3, nullity = 0 

6. If v4 is an ^ X « matrix, what is the largest possible value for its rank and the smallest possible value for its 
nullity? 

7. In each part, use the information in the table to determine whether the linear system /Ix = b is consistent. If so, 
state the number of parameters in its general solution. 





(a) 


(b) 


(c) 


(d) 


(e) 


(f) 


(g) 


Size of ^ 


3x3 


3x3 


3x3 


5x9 


5x9 


4x4 


6x2 


Rank {A) 


3 


2 


1 


2 


2 


0 


2 


Rank [A |b] 


3 


3 


1 


2 


3 


0 


2 



Answer: 



(a) Yes,0 

(b) No 

(c) Yes, 2 

(d) Yes, 7 

(e) No 

(f) Yes, 4 

(g) Yes, 0 

8. For each of the matrices in Exercise 7, find the nullity of A, and determine the number of parameters in the 
general solution of the homogeneous linear system Jix, = 0- 

9. What conditions must be satisfied by fei, &2? ^^3? &5 the overdetermined linear system 

x\ — 2x2 = b2 

XI -4^2 =64 
XI + 5x2 = 

to be consistent? 
Answer: 

b\=r^ b2=s, 63 = 45— 3r, 64 = 2/" — s, 65 = 85 — 7/" 

10. Let 



A = 



^11 ^12 ^\3l 
^22 <^iz\ 



Show that A has rank 2 if and only if one or more of the determinants 



^11 <^\2 
^21 ^22 



an ^13 
<^2l ^23[ 



\^\2 ^13 
U22 ^23 



is nonzero. 



11. Suppose that ^ is a 3 x 3 niatrix whose null space is a line through the origin in 3-space. Can the row or column 
space of A also be a line through the origin? Explain. 

Answer: 



No 

12. Discuss how the rank of A varies with t. 



(a) 



A = 



1 1 

1 i 



(b) 



A = 



t 1 1 
t 3 



-1 
-2 



13. Are there values of r and s for which 



1 0 0 
0 r-2 2 
0 s-1 r + 2 
0 0 3 

has rank 1? Has rank 2? If so, find those values. 
Answer: 

Rank is 2 if ^ = 2 and j = 1 ; the rank is never 1 . 

14. Use the result in Exercise 10 to show that the set of points (x, y, z) in for which the matrix 

~x y z~ 

1 X y 

has rank 1 is the curve with parametric equations 7i'=i,y — t?'->z='t^' 

15. Prove: If ^ 0? then A and kA have the same rank. 

(a) Give an example of a 3 x 3 matrix whose column space is a plane through the origin in 3-space. 

(b) What kind of geometric object is the null space of your matrix? 

(c) What kind of geometric object is the row space of your matrix? 

(a) If ^ is a 3 X 5 matrix, then the number of leading 1 's in the reduced row echelon form of A is at most 
. Why? 

(b) If v4 is a 3 X 5 matrix, then the number of parameters in the general solution of ^ = 0 is at most 
. Why? 

(c) If ^ is a 5 X 3 matrix, then the number of leading 1 's in the reduced row echelon form of A is at most 
.Why? 

(d) If ^ is a 5 X 3 matrix, then the number of parameters in the general solution of ^ = 0 is at most 
.Why? 

Answer: 

(a) 3 

(b) 5 

(c) 3 

(d) 3 



18. 



(a) If ^ is a 3 X 5 matrix, then the rank of A is at most 

(b) If ^ is a 3 X 5 matrix, then the nullity of A is at most 

(c) If ^ is a 3 X 5 matrix, then the rank of is at most 

(d) If ^ is a 3 X 5 matrix, then the nullity of _<4 ^ is at most 



. Why? 
_. Why? 
_ . Why? 
. Why? 



19- Find matrices A and B for which rank(j4) = rank(5), but rank(j4^) * rank(5^). 



Answer: 



20. Prove: If a matrix A is not square, then either the row vectors or the column vectors of A are linearly dependent. 

True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) Either the row vectors or the column vectors of a square matrix are linearly independent. 
Answer: 

False 

(b) A matrix with linearly independent row vectors and linearly independent column vectors is square. 
Answer: 

True 

(c) The nullity of a nonzero mxn matrix is at most m. 
Answer: 

False 

(d) Adding one additional column to a matrix increases its rank by one. 
Answer: 

False 

(e) The nullity of a square matrix with linearly dependent rows is at least one. 
Answer: 

True 

(f) If A is square and Ax = b is inconsistent for some vector b, then the nullity of A is zero. 
Answer: 

False 

(g) If a matrix A has more rows than columns, then the dimension of the row space is greater than the dimension of 
the column space. 

Answer: 

False 



If rank -^4 




then A is square. 



Answer: 



False 



(i) 



There is no 3 x 3 niatrix whose row space and null space are both lines in 3 -space. 



Answer: 

True 

(j) If F is a subspace of and Wis a subspace of V, then is a subspace off-*-. 
Answer: 

False 
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4-9 Matrix Transformations from R" to 

In this section we will study functions of the form w= ^(x), where the independent variable x is a vector in and the 
dependent variable ^ is a vector inR^. We will concentrate on a special class of such functions called "matrix 
transformations." Such transformations are fundamental in the study of linear algebra and have important applications 
in physics, engineering, social sciences, and various branches of mathematics. 



Functions and Transformations 

Recall that a function is a rule that associates with each element of a set A one and only one element in a set B. If f 
associates the element b with the element a, then we write 

b = f(a) 

and we say that b is the image of a under / or that / (a) is the value of f at a. The set A is called the domain of f and the 
set B the codomain of f (Figure 4.9.1). The subset of the codomain that consists of all images of points in the domain is 
called the range of f. 



Domain Codomain 
A B 

Figure 4.9.1 



For many common functions the domain and codomain are sets of real numbers, but in this text we will be concerned 
with functions for which the domain and codomain are vector spaces. 

r n 



DEFINITION 1 



If Kand ^are vector spaces, and if/is a function with domain Kand codomain W, then we say that /is a 
transformation from Kto ^or that / maps Vto W, which we denote by writing 

In the special case where = the transformation is also called an operator on V. 
L J 



In this section we will be concerned exclusively with transformations from R^ to R^; transformations of general vector 
spaces will be considered in a later section. To illustrate one way in which such transformations can arise, suppose that 
f \y f 2^ ---^ f m real-valued functions of n variables, say 



-^2 = f2(^U^2^--:^yi) 



(1) 



— f m(^l> ^2> ---^ ^n) 

These m equations assign a unique point (vt?i, W2, - w^jj) in to each point (x\, X2, - ^n) in and thus define a 
transformation from /J " to If we denote this transformation by T, then T:R^ and 

7(7:1, ^2. = (>»'l, ><'2, — , 



Matrix Transformations 



In the special case where the equations in 1 are Hnear, they can be expressed in the form 



vt?2 = a2\X\ + £222^2 + 

"^m = ^m\^\ + ^m2^2 + 

which we can write in matrix notation as 







"an 


«12 
















am2 



^2n 



^1 

^2 



(2) 



(3) 



or more briefly as 



w=Ax. 



(4) 



Although we could view this as a linear system, we will view it instead as a transformation that maps the column vector 
x in into the column vector ^ in by multiplying x on the left by A. We call this a matrix transformation (or 
matrix operator ifm= «)? and we denote it by Tj^.R^ — ► R^- With this notation, Equation 4 can be expressed as 



w=T^(x) 

The matrix transformation is called multiplication by A, and the matrix A is called the standard matrix for the 
transformation. 

We will also fmd it convenient, on occasion, to express 5 in the schematic form 
which is read "Tj\ maps x into w." 

EXAMPLE 1 A Matrix Transformation from to ^ 

The matrix transformation T.R^ R^ defined by the equations 

w\ = 27:1 — 3^2 +:t3 ^ 57:4 

W2 = 47:1 +7:2-27:3 (7) 
W3 = 57:1—7:2 + 47:3 

can be expressed in matrix form as 



(5) 



(6) 



so the standard matrix for T is 



2 


-3 


1 


-5 


4 


1 


-2 


1 


5 


-1 


4 


0 



1 

_2 
4 



-5 

1 
0 



^1 
X4 



(8) 



The image of a point (x\, X2, ^3, ^4) can be computed directly from the defining equations 7 or from 8 
by matrix muMphcation. For example, if 

(xuX2,X2,X4) = (1, -3, 0, 2) 
then substituting in 7 yields = 1, W2 = 3, W3 = 8 (verify), or alternatively fi-om 8, 







"2 


-3 


1 


-5 






4 


1 


-2 


1 


W2 




5 


-1 


4 


0 



Some Notational Matters 

Sometimes we will want to denote a matrix transformation without giving a name to the matrix itself. In such cases we 
will denote the standard matrix for T.R^ ~> by the symbol [ T] . Thus, the equation 

T(x) = [T]x (9) 

is simply the statement that T is a matrix transformation with standard matrix [ T] , and the image of x under this 
transformation is the product of the matrix [ T] and the column vector x- 



Properties of Matrix Transformations 

The following theorem lists four basic properties of matrix transformations that follow from properties of matrix 
multiplication. 



THEOREM 4.9.1 

For every matrix A the matrix transformation Tj^. — ► R^ has the following properties for all vectors u and v 
in and for every scalar k: 

(a) TAiO) = 0 

(b) ^j4(^) = ^^^(^) [Homogeneity property] 

(c) r^Cu + v) = T^Cu) + T^Cv) [Additivity property] 



(d) 7a{^ - v) = T^Cu) " 7^(v) 

Proof All four parts are restatements of familiar properties of matrix multiplication: 

i40 = 0, A{k\i)=k{AM), A{m^y)=Am^Ay, A{n-v) = An- Ay 



It follows from Theorem 4.9.1 that a matrix transformation maps linear combinations of vectors in into the 
corresponding linear combinations in in the sense that 

Depending on whether /i-tuples and m-tuples are regarded as vectors or points, the geometric effect of a matrix 
transformation Tj^.R^^ — > i?^" is to map each vector (point) in into a vector (point) in (Figure 4.9.2). 





T maps vectors to vectors. 



T maps points to points. 



(10) 



Figure 4.9.2 



The following theorem states that if two matrix transformations from R^ to have the same image at each point of 
then the matrices themselves must be the same. 



THEOREM 4.9.2 

If Tj(,R^ R^ and T£:R^ — ► R^ are matrix transformations, and if ^^^(x) = T£(x) for every vector x in /?" 
,then^ = 5. 

Proof To say that ^^^(x) = ^^(x) for every vector in is the same as saying that 

Ax = Bx 

for every vector x in This is true, in particular, if x is any of the standard basis vectors e i , 92, - - e„ for R^; that is, 

Aej = Bej (j = 1, 2, «) (11) 

Since every entry of^j is 0 except for the jth, which is 1, it follows from Theorem 1.3.1 that is the jth column of ^ 
and Be j is the jth column of B. Thus, it follows from 1 1 that corresponding columns of A and B are the same, and hence 
that ^ = 5. 



EXAMPLE 2 Zero Transformations M 



If ^ is the ^ X ?2 zero matrix, then 

To(x) = Ox = 0 

SO muhipHcation by zero maps every vector in into the zero vector inR^^. We call Tq the zero 
transformation from to R^^. 



EXAMPLE 3 Identity Operators < 

If / is the « X « identity matrix, then 

Tj(x) =Ix = x 

so multiplication by / maps every vector in R^ into itself We call Tj- the identity operator on R^. 



A Procedure for Finding Standard Matrices 

There is a way of finding the standard matrix for a matrix transformation from R^ to R^hy considering the effect of 
that transformation on the standard basis vectors for To explain the idea, suppose that A is unknown and that 

ei, e2, e„ 

are the standard basis vectors for Suppose also that the images of these vectors under the transformation T^^ are 

^^(ei)=^ei, 74(62) =^82,.-., 7^(e„)=Je„ 

It follows from Theorem 1.3.1 that Ae^ is a linear combination of the columns of A in which the successive coefficients 
are the entries oi^j. But all entries of©; are zero except the yth, so the product A^j is just the 7th column of the matrix 
A. Thus, 

A= [7-^(61)17-^(62)1 • • • |r^(e„)] (12) 



In summary, we have the following procedure for finding the standard matrix for a matrix transformation: 

r 

Finding the Standard IVIatrix for a Matrix Transformation 

Step 1. Find the images of the standard basis vectors e 1 , 82, . . e„ for in column form. 

Step 2. Construct the matrix that has the images obtained in Step 1 as its successive columns. This matrix is the 
standard matrix for the transformation. 



Reflection Operators 



Some of the most basic matrix operators on and are those that map each point into its symmetric image about a 
fixed Hne or a fixed plane; these are called reflection operators. Table 1 shows the standard matrices for the reflections 
about the coordinate axes inp}, and Table 2 shows the standard matrices for the reflections about the coordinate planes 
in p^. In each case the standard matrix was obtained by finding the images of the standard basis vectors, converting 
those images to column vectors, and then using those column vectors as successive columns of the standard matrix. 



Table 1 



Operator 


Illustration 


Images of ei and ei 


Standard 
Matrix 


Reflection about the 
j-acis 

T{x,y) = {-x,y) 


< A..V) ^ 


y 

^'^*>*^ 

^ ; 


T(ei) = 7(1, 0) = (-1.0) 
7(e2) = 7(0.1) = (0,1) 




-1 0" 

0 1_ 




Reflection about the 
X-axis 




>•) 


7(ei) = 7(1,0) = (1,0) 
r(e2) = 7(0,1) = (0, -1) 




"1 0" 
0 -1 




Reflection about the line 
y = x 

7{x, y) = 0, x) 


Ax) 


► 


r(ei) = 7(1,0) = (0,1) 
7(e2) = 7(0,1) = (1,0) 




"0 r 

_1 0_ 





Table 2 



Operator 


Illustration 


ei, ei, e3 


Standard 
Matrix 


Reflection about the 
xy-plane 

Tix,y,z)^{x, y, -z) 




A 

\ 

\ 

w 


U. y. I) 

y 


r(ei) = 7(1, 0,0) = (1,0,0) 
7(62) = 7(0, 1,0) = (0,1,0) 
7(63) = 7(0,0, 1) = (0,0, -1) 




'1 0 0] 
0 1 0 
0 0 -ij 


Reflection about the 
xz-plane 

T{x,y,z) = {x, -y.z) 


■/ 


^ y. z) 


7(6i) = 7(1, 0,0) = (1,0,0) 
7(62) = 7(0,1,0) = (0, -1,0) 
7(63) = 7(0, 0, 1) = (0, 0, 1) 




'1 0 0] 
0-10 
0 0 ij 


Reflection about the 
jz-plane 

T(x,y,z) = i-x, y,z) 




J 


, (- t.v. c) 


7(6i) = 7(1, 0,0) = (-1,0,0) 
7(62) = 7(0, 1,0) = (0,1,0) 
7(63) = 7(0, 0,1) = (0,0,1) 




-1 0 ol 
0 1 0 
0 0 ij 



Projection Operators 



Matrix operators on and p/ that map each point into its orthogonal projection on a fixed Hne or plane are called 
projection operators (or more precisely, orthogonal projection operators). Table 3 shows the standard matrices for the 
orthogonal projections on the coordinate axes in /J^, and Table 4 shows the standard matrices for the orthogonal 
projections on the coordinate planes in g^. 



Table 3 



Operator 


Illustration 


Images of ei and 


Standard 
Matrix 


Orthogonal projection on the 
X-axis T(^x^y) = (^x, 0) 




X 1 

Ax) 


7(ei) = 7(1.0) = (1,0) 
7(62) = 7(0.1) = (0.0) 




"1 0" 
0 0_ 




Orthogonal projection on the 
j-axisr(;r,7) = (0, y) 






7(ei) = 7(1.0) = (0.0) 
7(e2) = 7(0,1) = (0,1) 




"0 0" 
0 1_ 





Table 4 



Operator 


Illustration 


Images of ei, ei, ej 


Standard 
Matrix 


Orthogonal projection on 
the xy-plane 
T{7i,y,z) = {x, y, 0) 


y 


^''^^ — ^ 


rCei ) 7(1, 0,0) = (1,0.0) 
7(62) = 7(0, 1, 0) = (0, 1, 0) 
7(63) = 7(0,0, 1) = (0,0,0) 




"1 0 0" 
0 1 0 
0 0 0 




Orthogonal projection on 
the xz-plane 
T(7:,7,z) = (x, 0,z) 


7 


^ ^ 


r(ei) = 7(1,0, 0) = (1, 0,0) 
7(62) = 7(0, 1, 0) = (0, 0, 0) 
7(63) = T{(i, 0, 1) = (0, 0, 1) 




"1 0 0' 
0 0 0 
0 0 1 




Orthogonal projection on 
the jz-plane 
7(x.7,^) = (0, y,z) 




y^^Kx. >; z) 
► 


7(ei) = 7(1. 0. 0) = (0. 0, 0) 
7(62) = 7(0. 1. 0) = (0, 1, 0) 
7(63) = 7(0. 0. 1) = (0. 0. 1) 




"0 0 0" 
0 1 0 
0 0 1 





Rotation Operators 



Matrix operators on and that move points along circular arcs are called rotation operators. Let us consider how 
to find the standard matrix for the rotation operator X: P? - F? that moves points counterclockwise about the origin 
through an angle 9 (Figure 4.9.3). As illustrated in Figure 4.9.3, the images of the standard basis vectors are 

7(ei) = 7(1, 0) = (cos e, sin 6) and 7(62) = 7(0, 1) = ( - sin cos 9) 
so the standard matrix for T is 




Figure 4.9.3 



(13) 



In keeping with common usage we will denote this operator by and call 

[cosfl — sinfl 
svaB cosG 

the rotation matrix for R^. If x = (x, y) is a vector in p}, and if w= (wj, wj) is its image under the rotation, then the 
relationship w^Rgs. can be written in component form as 

vi?! =xzosB — ysw0 
W2 = xs\n0 + ycosB 

These are called the rotation equations for These ideas are summarized in Table 5. 



(14) 



Table 5 



Operator 



Illustration 



Rotation Equations 



Standard Matrix 



Rotation through an angle 9 




w\ = xcosO — ysmU 
W2 = xsmB + 7cos5 



cos^' —s\r£ 
sinfl COS0 



In the plane, counterclockwise angles are positive 
and clockwise angles are negative. The rotation 
matrix for a clockwise rotation of —(/ radians can be 
obtained by replacing (/ by — ((/ in 12. After 
simplification this yields 

_ cosd sin9 1 
— sin? cos^ 



EXAMPLE 4 A Rotation Operator M 

Find the image of x = (1, 1) under a rotation of k- / 6 radians (~ •^'^ ) about the origi 



Solution It follows from 1 3 with = ?r / 6 that 









\f3-^] 






1 




2 


i i 




1 




1 + 1/3 


2 2 






2 



0.37 
1.37 



or in comma-delimited notation, R^f^{\, 1) = (0.37, 1.37). 



Rotations in 



A rotation of vectors in is usually described in relation to a ray emanating from the origin, called the axis of 
rotation. As a vector revolves around the axis of rotation, it sweeps out some portion of a cone (Figure 4.9.4a). The 
angle of rotation, which is measured in the base of the cone, is described as "clockwise" or "counterclockwise" in 
relation to a viewpoint that is along the axis of rotation looking toward the origin. For example, in Figure 4.9.4a the 
vector IV results from rotating the vector x counterclockwise around the axis / through an angle (/. As in /J;^, angles are 
positive if they are generated by counterclockwise rotations and negative if they are generated by clockwise rotations. 



Axis of R>lalion 




C ounicrc lockw isc 
rotation 



(a) Angle of rotation (6) Right-hand rule 

Figure 4.9.4 



The most common way of describing a general axis of rotation is to specify a nonzero vector u that runs along the axis 
of rotation and has its initial point at the origin. The counterclockwise direction for a rotation about the axis can then be 
determined by a "right-hand rule" (Figure 4.9.4b): If the thumb of the right hand points in the direction of u, then the 
cupped fmgers point in a counterclockwise direction. 

A rotation operator on is a matrix operator that rotates each vector in J?^ about some rotation axis through a fixed 
angle (/. In Table 6 we have described the rotation operators onp} whose axes of rotation are the positive coordinate 
axes. For each of these rotations one of the components is unchanged, and the relationships between the other 
components can be derived by the same procedure used to derive 14. For example, in the rotation about the z-axis, the 
z-components of x and w= ^(x) are the same, and the x- and };-components are related as in 14. This yields the rotation 
equation shown in the last row of Table 6. 



Table 6 



Operator 



Illustration 



kotaiioii 1 quations 



Standard Matrix 



Counterclockwise 
rotation about 
the positive ,t-axis 
through an 
angle 0 




W2 = y cos 6-z sin 0 
= y sin ^-i-zcos 0 



1 0 
0 cos 0 
0 sin 6 



0 

-sin 0 
cos^ 



Counterclockwise 
rotation about 
the positive y-axis 
through an 
angle 0 




wi =A COS sin 6 
W2 = y 

^3 = -x sin 0+ z cos 6 



cos 6 0 sin^ 
0 1 0 
-sin 0 0 cos 0 



Counterclockwise 
rotation about 
the positive r-axis 
through an 
angle 0 




w\ = .vcos e-y sin 0 
wy = v sin 0 + y cos 0 



cos 0 
sin 6 
0 



-sin 0 0 

cos e 0 

0 1 



For completeness, we note that the standard matrix for a counterclockwise rotation through an angle 0 about an axis in 
Z^;^, which is determined by an arbitrary unit vector u = {a,b,c) that has its initial point at the origin, is 



a^{\ — cos6>J + cos9 ab(l — cos9) —csmB ac(l — cos£^) -hisin£? 
ab(l — cosfl) -^csvcS i^^l — cosflj =H cos^ bc(l — cosfl) —asv[£ 
ac(l — cos9) ^bsmB bc(l — cosff) +<3sini9 c^{\ — cosflj + cos9 



(15) 



The derivation can be found in the book Principles of Interactive Computer Graphics, by W. M. Newman and R. F. 
Sproull (New York: McGraw-Hill, 1979). You may fmd it instructive to derive the results in Table 6 as special cases of 
this more general result. 



Dilations and Contractions 



If ^ is a nonnegative scalar, then the operator ^(x) =kxonR^ or has the effect of increasing or decreasing the 
length of each vector by a factor of ^. If 0 < jfc < 1 the operator is called a contraction with factor k, and if ^ ] it is 



called a dilation with factor k (Figure 4.9.5). If t = 1 , then T is the identity operator and can be regarded either as a 
contraction or a dilation. Tables 7 and 8 illustrate these operators. 



7Xx) = ^x 



{a) Q<k<\ (b) k> 1 

Figure 4.9.5 
Table 7 



Operator 



Illustration 

T(x,y) = (kx^ky) 



Effect on the Standard Basis 



Standard 
Matrix 



Contraction with factor k 
onR^ (0<it<l) 



<0. il 




(1,0) a%o) 



Dilation with factor k on 





k 0 
0 k 



Table 8 



Operator 



Illustration 
T(x,y, z)'(kx,ky,kz) 



Standard 
Matrix 



Contraction with 
factor k on 

iO^k^l) 




Dilation with 
factor k on 

{k> 1) 




k 


0 


0 


0 


A 


0 


0 


0 


k 



Yaw, Pitch, and Roll 

In aeronautics and astronautics, the orientation of an aircraft or space shuttle relative to an Tr^z-coordinate 
system is often described in terms of angles cMed yaw, pitch, and roll. If, for example, an aircraft is flying 



along the 3;-axis and the -plane defines the horizontal, then the aircraft's angle of rotation about the z-axis is 
called the yaw, its angle of rotation about the x-axis is called the pitch, and its angle of rotation about the 3;-axis 
is called the roll. A combination of yaw, pitch, and roll can be achieved by a single rotation about some axis 
through the origin. This is, in fact, how a space shuttle makes attitude adjustments — it doesn't perform each 
rotation separately; it calculates one axis, and rotates about that axis to get the correct orientation. Such rotation 
maneuvers are used to align an antenna, point the nose toward a celestial object, or position a payload bay for 
docking. 




Pitch 



Expansion and Compressions 



In a dilation or contraction of /J ^ or /J ^, all coordinates are multiplied by a factor k. If only one of the coordinates is 
multiplied by k, then the resulting operator is called an expansion or compression with factor k. This is illustrated in 
Table 9 for pj^. You should have no trouble extending these results to p^^. 



Table 9 



Operator 



Illustration 



Effect on the Standard Basis 



Standard 
Matrix 



Compression of in the 

x-direction with factor k 
(0<^<1) 



(0, 



(0, Dj? 



(1.0) 



Expansion of in the 
x-direction with factor k 

{k>\) 



Operator 



10. Dir 



10. \)k 



k 0 
0 1 



Illustration 



(1,0) 

Effect on the Standard Basis 



Standard 
Matrix 



Compression of /^^ in the 

y-direction with factor k 
(0<Ar<l) 




(0. I). 



(1.0) 



1 0 
0 k 



(i.O) 



Operator 


Illustration 


Effect on the Standard Basis 


Standard 
Matrix 


Expansion of /J;^ in the 
}^-direction with factor k 

{k>\) 


i 


► 


(0J)> 


i^.k\} — 1 








\ \ 




(1.0. ' .! 





Shears 



A matrix operator of the form T(x, y) = (x I ky, y) translates a point (x, y) in the .Ty -plane parallel to the x-axis by 
an amount ky that is proportional to the };-coordinate of the point. This operator leaves the points on the x-axis fixed 
(since y = 0), but as we progress away from the x-axis, the translation distance increases. We call this operator the 
shear in the x-direction with factor k. Similarly, a matrix operator of the form T(x, y) = (x, y \ kx) is called the 
shear in the y-direction with factor k. Table 10 illustrates the basic information about shears mp}. 

Table 10 



Operator 



Effect on the Standard Basis 



Standard 
Matrix 



Shear oip} in the x-direction with 
factor k T{x, 7) = + ky, y) 



(0, IX 



(A 1) 



(1,0) 




1) 



(1.0) 

{I < 0) 



1 k 

0 1 



Shear of /J ^ in the y-direction with 
factor k T(x, y) = (x,y + kx) 



(0. i)kr 



-i 



(0, D." 



(1.0) 



(0. 1)1 



(1,*) 



1 0 
k 1 



(I.*) 



(k > 0) 



(k<0) 



EXAMPLE 5 Some Basic Matrix Operators on 



In each part describe the matrix operator corresponding to A, and show its effect on the unit square. 



m2= 



2 0 
0 2 



(0)^3 = 



2 0 
0 1 



Solution By comparing the forms of these matrices to those in Tables 7, 9, and 10, we see that the 
matrix Ai corresponds to a shear in the x-direction with factor 2, the matrix A2 corresponds to a dilation 
with factor 2, and A2 corresponds to an expansion in the jc-direction with factor 2. The effects of these 
operators on the unit square are shown in Figure 4.9.6. 



I 2 3 
Figure 4.9.6 



OPTIONAL 

Orthogonal Projections on Lines Tlirougli the Origin 

In Table 3 we listed the standard matrices for the orthogonal projections on the coordinate axes in These are special 
cases of the more general operator T.R^ — ► that maps each point into its orthogonal projection on a line L through 
the origin that makes an angle fj with the positive x-axis (Figure 4.9.7). In Example 4 of Section 3.3 we used Formula 10 
of that section to fmd the orthogonal projections of the standard basis vectors for on that line. Expressed in matrix 
form, we found those projections to be 







( ] 




sini9cos^ 


(.,). 


and Tlei 








[ 1 




sm^9 



Figure 4.9.7 



Thus, the standard matrix for T is 









1 \ 




1 \ 






T 




T 


ei 


T 














[ 1 




I J 







2 

COS 9 sin^cosO 



^sm29 sm^9 



In keeping with common usage, we will denote this operator by 



cos^9 sinflcos^ 
sini9cosfl sin 9 



cos^9 ^sm29 
■isin2S sin^0 



We have included two versions of Formula 16 
because both are commonly used. Whereas the first 
version involves only the angle 9, the second 
involves both 9 and 29. 



(16) 



EXAMPLE 6 Orthogonal Projection on a Line Through the Origin M 



Use Formula 16 to find the orthogonal projection of the vector x = (1, 5) on the line through the origin 
that makes an angle of~/6 ^=30 J with the x-axis. 

Solution Since sm(w / 6) = 1 / 2 and cos ^tt / 6 j = ^/ 2, it follows from 16 that the standard matrix 
for this projection is 



Pir/6 = 



cos^^^r/ 6 J sin(?r/ 6)cos(ir/ 6) 
sin(ir / 6) cos (?r / 6) sin^ / 6 J 





1 




4 4 




H 1 




4 4 



Thus, 



3 il 






3 + 5/3 


4 4 


"r 




4 


Ji 1 






\l3 + 5 


4 4 






4 



2.91 
1.68 



or in comma-delimited notation, P^/^(\, 5) « (2.91, 1.68) 



Reflections About Lines Tlirough the Origin 

In Table 1 we listed the reflections about the coordinate axes in f!^-^. These are special cases of the more general operator 
H§:R^ — ► F? that maps each point into its reflection about a line L through the origin that makes an angle 9 with the 
positive X-axis (Figure 4.9.8). We could find the standard matrix for H^hy finding the images of the standard basis 
vectors, but instead we will take advantage of our work on orthogonal projections by using the Formula 16 for Pff to 
find a formula for H^. 




Figure 4.9.8 



You should be able to see from Figure 4.9.9 that for every vector x in 

Pgx. — X = [Hgx. — X J or equivalently Hgx. = [2Pq — / ]x 



Figure 4.9.9 



Thus, it follows from Theorem 4.9.2 that 



and hence from 16 that 



_ r cos2S sin20l 
EXAMPLE 7 Reflection About a Line Through the Origin A 



Find the reflection of the vector x = (1, 5) on the line through the origin that makes an angle of 7r/6(= 30°) 
with the X-axis. 

Solution Since sin COS (it / 3) = 1 / 2, it follows from 18 that the standard matrix 

for this projection is 

"i il" 

cos(7C"/3) sin(7r/3) 2 2 

sin(ir/3) — cos(7r/3) 



ji _i 

2 2 



Thus, 





1 £ 






1 1 5/3 








2 2 


"1" 




2 




" 4.83 


H _i 


5 




f3-5 




-1.63 




2 2 






2 






or in comma-delimited notation, H^/^(\, 5) w 


(4.83. 


-1.63) 







Show that the standard matrices in Tables 1 and 3 
are special cases of 18 and 16. 



Concept Review 

• Function 

• Image 



• Value 

• Domain 

• Codomain 

• Transformation 

• Relationships among the fundamental spaces 

• Operator 

• Matrix transformation 

• Matrix operator 

• Standard matrix 

• Properties of matrix transformations 

• Zero transformation 

• Identity operator 

• Reflection operator 

• Projection operator 

• Rotation operator 

• Rotation matrix 

• Rotation equations 

• Axis of rotation in 3 -space 

• Angle of rotation in 3-space 

• Expansion operator 

• Compression operator 

• Shear 

• Dilation 

• Contraction 

Skills 

• Find the domain and codomain of a transformation, and determine whether the transformation is linear. 

• Find the standard matrix for a matrix transformation. 

• Describe the effect of a matrix operator on the standard basis in R^. 



Exercise Set 4.9 

In Exercises 1-2, fmd the domain and codomain of the transformation T^(x) = Ax. . 

^' (a) A has size 3x2- 

(b) A has size 2x3- 

(c) A has size 3x3- 

(d) A has size 1x6- 



Answer: 



(a) Domain: /J^. codomain: /J-^ 

(b) Domain: /J^; codomain: /j2 

(c) Domain: /J^; codomain: 

(d) Domain: p^; codomain: /J^ 

^* (a) ^ has size 4x5- 

(b) A has size 5x4- 

(c) A has size 4x4- 

(d) A has size 3 > 1 • 

3. If 7(7:1, X2) = (^1 i " then the domain of Tis , the codomain of Tis , and 

the image of x = (1, — 2) under Tis . 



and the image of x = (0, —1,4) under T is . 

5. In each part, fmd the domain and codomain of the transformation defined by the equations, and determine whether 
the transformation is Hnear. 

(a) =37:1 -2:^2 i 

>V2 = 5a:i — 8x2+ X2 

(b) wi = 2j:i;r2- ^2 
W2= ^^1 +37:17:2 
W3= ^1 + ^2 

(c) vt'l = 57:1 -7:2-1-7:3 

W2 = 2x1 " ^^2 " ^3 

(d) vi?i= x^-3x2 I :t3 -2:^4 

W2 = 37:1 -47:2 -7:3 + 7:4 
Answer: 

(a) Linear; 53_^j22 

(b) Nonlinear; _^ 

(c) Linear; 53_^j^3 

(d) Nonlinear; /j4 _^ j|j2 

6. In each part, determine whether T is a matrix transformation. 



Answer: 



r\ r\ (-1,2,3) 



4. If 7(7:1, ^2» ^3) = (^1 + ^1 " 27:2), then the domain of Tis 



, the codomain of T is 



(a) nx.y) = (2x.y) 

(b) nx.y) = i-y.x) 

(c) T(,x.y) = i2x+y.x-y) 




(e) nx.y) = (x.y + \) 

7. In each part, determine whether T is a matrix transformation. 

(a) T(x,y,z) = (0.0) 

(b) nx,y,z) = (\,l) 

(c) nx.y.z)^(3x-4y,2x-5z) 

(d) T[x.y.z)=(^\z) 

(e) nx.y.z) = (y-\,x) 
Answer: 

(a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 

8. Find the standard matrix for the transformation defined by the equations. 

X4 
X4 



(a) wi 




2x1 - 3jr2 + 


W2 




3^1 + 5x2 - 


(b) 




Ttti -i- 2x2 " 87:3 


W2 




-7:2 -f 57:3 


W2 




4x1 +7^2 — a:3 


(c) ^1 




-^1 + X2 


W2 




3x1 - 23:2 


W2 




5^:1 — 7x2 


(d) ^1 




^1 


W2 




X1+X2 


W3 




^1 +^2 + ^3 


W4 




XI +X2 + X2+X4 



9. Find the standard matrix for the operator T:R^ — ► defined by 

wi = 3:^1 + 5x2 - ^3 
W2 = 4x1 — X2 + X2 
W3 = 3j:i + 2^:2 - X2 

and then calculate ?*( — 1, 2, 4) by directly substituting in the equations and also by matrix multiplication. 

Answer: 



3 5 -1 

4 -1 1 
3 2-1 



;7'(-l,2,4) = (3, -2, -3) 



10. Find the standard matrix for the operator T defined by the formula. 

(a) 7(xuX2) = (2x\-X2.xi+X2) 

(b) Tixi.X2) = (xi.xj) 

(c) T(xi,X2,X2) = (xi I 27:2 + 7:3, ;ti -H 5X2,^3) 

(d) T(xu X2. X2) = (4x1, 7x2. - 8x3) 

11. Find the standard matrix for the transformation T defined by the formula. 

(a) T(X[,X2) = (X2, - XI, XI + 3x2, XI -X2) 

(b) 7'(xi,X2,X3,X4) = (7x1 + 2x2-X3 + X4,X2 + X3, -^l) 



(c) nxux2,xi) = (0,0,0,0,0) 

(d) X2. X2, X4) = (X4,Xi, X2. X2. Xi-xi) 



Answer: 



(a) 



(b) 



0 
-1 

1 

1 

7 



1 
0 
3 

-1 

2 -1 



0 1 
-1 0 

(c) 0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 

(d) 0 0 0 1 
10 0 0 
0 0 10 
0 1 0 0 
10-10 

12. In each part, find T(x), and express the answer in matrix form. 

3' 
-2 



(a) 
(b) 



r-1 2 oi 

=[ 3 1 5r= 



(c) 



(d) 



-2 1 4 
3 5 7 
6 0-1 





•1 




1 




3_ 












'3 



-1 1 

2 4 
7 8 



x = 



13. In each part, use the standard matrix for 7 to find ^(x); then check the result by calculating T(^x) directly. 

(a) Tixi,X2) = i-xi+X2,X2y,x=i-\,4) 

(b) Tixi.X2.X2) = i2xi-X2 + X3.X2 + X3.0);x=(2, 1, -3) 

Answer: 

(a) T(-1.4) = (5.4) 

(b) 7(2, 1. -3) = (0. -2.0) 

14. Use matrix multiplication to find the reflection of ( — 1, 2) about 
(a) the X-axis. 



(b) thej-axis. 

(c) the line y = j:. 

15. Use matrix multiplication to find the reflection of (2, — 5, 3) about 

(a) the xjF-plane. 

(b) thexz-plane. 

(c) thejz-plane. 



16. Use matrix multiplication to find the orthogonal projection of (2, — 5) on 

(a) the X-axis. 

(b) thejF-axis. 

17. Use matrix multiplication to find the orthogonal projection of ( — 2, 1, 3) on 

(a) thexj-plane. 

(b) thexz-plane. 

(c) the j^z-plane. 



18. Use matrix multiplication to find the image of the vector (3, — 4) when it is rotated through an angle of 



(a) 9 = 30'. 

(b) ^=-6o^ 

(c) 9 = 45'. 

(d) 9 = 90'. 

19. Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 

(a) 30° about the x-axis. 

(b) 45° about the jF-axis. 

(c) 90° about the z-axis. 



Answer: 



(a) (2, -5,-3) 

(b) (2, 5, 3) 

(c) (-2. -5.3) 



Answer: 



(a) (-2.1.0) 

(b) (-2,0,3) 

(c) (0, 1, 3) 



Answer: 




(c) (-1. -2.2) 



20. Find the standard matrix for the operator that rotates a vector in through an angle of —60^ about 

(a) the X-axis. 

(b) thej-axis. 

(c) thez-axis. 

21. Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 

(a) —30** about the x-axis. 

(b) —45^ about the y-axis. 

(c) —90^^ about the z-axis. 



respectively. 

(a) Show that the orthogonal projections on the coordinate axes are matrix operators, and find their standard 



(b) Show that if T:R^ — ^ is an orthogonal projection on one of the coordinate axes, then for every vector x in R-^ 
, the vectors ^(x) and x — ^(x) are orthogonal. 

(c) Make a sketch showing x and x — ^(x) in the case where Tis the orthogonal projection on the x-axis. 

23. Use Formula 15 to derive the standard matrices for the rotations about the x-axis, j^-axis, and z-axis in 

24. Use Formula 15 to fmd the standard matrix for a rotation radians about the axis determined by the vector 
v= (1, 1, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 

25. Use Formula 15 to find the standard matrix for a rotation of 180° about the axis determined by the vector 
v= (2, 2, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 



Answer: 




matrices. 



Answer: 



i 
9 

8 
9 
4 
9 



4 
9 



8 
9 
1 



4 
9 

4 



9 



26. It can be proved that if ^ is a 2 x 2 matrix with orthonormal column vectors and for which det(^) = 1, then 
multiplication by ^ is a rotation through some angle g. Verify that 



A = 



_ 1 

1 L 

/2 {1 



satisfies the stated conditions and find the angle of rotation. 

27. The resuh stated in Exercise 26 can be extended to that is, it can be proved that if v4 is a 3 y 3 matrix with 
orthonormal column vectors and for which det(^) = 1, then multiplication by ^ is a rotation about some axis 
through some angle 0. Use Formula 15 to show that the angle of rotation satisfies the equation 

^ ti-i: .4 ;i - 1 
cos U = — 

28. Let ^ be a 3 X 3 matrix (other than the identity matrix) satisfying the conditions stated in Exercise 27. It can be 
shown that if x is any nonzero vector in /J^, then the vector <i = Ac + -^l^x + j^l — tr^-4j jx determines an axis of 

rotation when u is positioned with its initial point at the origin. [See "The Axis of Rotation: Analysis, Algebra, 
Geometry," by Dan Kalman, Mathematics Magazine, Vol. 62, No. 4, October 1989.] 

(a) Show that multiplication by 



A = 



i 
9 

8 

0 

4 
■9 



4 
■9 

4 

2 
9 



is a rotation. 

(b) Find a vector of length 1 that defines an axis for the rotation. 

(c) Use the resuh in Exercise 27 to find the angle of rotation about the axis obtained in part (b). 
29. In words, describe the geometric effect of multiplying a vector x by the matrix A. 



(b) 



Answer: 



(a) Twice the orthogonal projection on the x-axis. 

(b) Twice the reflection about the x-axis. 

30. In words, describe the geometric effect of multiplying a vector x by the matrix A. 



2 0 
0 3 



(b) 



A = 



il .1 

2 2 
2 2 



31. In words, describe the geometric effect of multiplying a vector x by the matrix 



COS 0 — sin — 2 sin cos 0 
2 sin cos 0 cos^0 — sin^0 

Answer: 

Rotation through the angle 20- 

32. If multipHcation by A rotates a vector x in the xy-plane through an angle 9, what is the effect of multiplying x by ^ ^ 
? Explain your reasoning. 

33. Let XQ be a nonzero column vector in R^, and suppose that T.R^ — ^ is the transformation defined by the formula 
^(x) = XQ I Rfpc, where Rf f is the standard matrix of the rotation of about the origin through the angle 9. Give a 
geometric description of this transformation. Is it a matrix transformation? Explain. 

Answer: 

Rotation through the angle 9 and translation by xq; not a matrix transformation since xq is nonzero. 

34. A function of the form f (x) = mx + i> is commonly called a "linear function" because the graph of y = y^x + i is 
a line. Is /a matrix transformation on Rl 

35. Let X = XQ -}- be a line in and let T: R^ — ► /j" be a matrix operator on R^. What kind of geometric object is 
the image of this line under the operator J? Explain your reasoning. 

Answer: 

Aline in/j". 

True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) If ^ is a 2 X 3 matrix, then the domain of the transformation Tj\ is i?^. 
Answer: 

False 

(b) If ^ is an ^ X « matrix, then the codomain of the transformation Tj\ is 
Answer: 

False 

{Q)liT.R^ ^ R^ and ^(0) = 0, then Tis a matrix transformation. 
Answer: 
False 

(d) If T:/?" R^ and T(cix + C2Y) =c\T{-si) + C2T{y) for all scalars c:i and C2 and all vectors x and y in R^, then 
r is a matrix transformation. 

Answer: 




True 

(e) There is only one matrix transformation T:R^ — ► R^ such that T( — x) = — 7'(x) for every vector x in R^. 



Answer: 



False 



(f) There is only one matrix transformation T\R^ — ♦ J?^ such that ^(x + y) = T(iL — y) for all vectors x and y in 
Answer: 

True 

(g) If b is a nonzero vector inR^^, then ^(x) = x + b is a matrix operator on R' \ 
Answer: 

False 



(h) 



The matrix 



1 

2 
1 
2 



i 

2 
i 
2 



is the standard matrix for a rotation. 



Answer: 



False 



The standard matrices of the reflections about the coordinate axes in 2-space have the form 



\a 0 
[o -a 



, where 



3= ±1- 



Answer: 



True 
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4.10 Properties of Matrix Transformations 



In this section we will discuss properties of matrix transformations. We will show, for example, that if several 
matrix transformations are performed in succession, then the same result can be obtained by a single matrix 
transformation that is chosen appropriately. We will also explore the relationship between the invertibility of a 
matrix and properties of the corresponding transformation. 



Compositions of Matrix Transformations 

Suppose that Tj\ is a matrix transformation from to and T£ is a matrix transformation from to R^^\lfx 
is a vector in then maps this vector into a vector Tj[{-s.) inR^, and T^-, in turn, maps that vector into the 
vector ^^(^^(x)) mR^. This process creates a transformation from R[^'^ to R^^ that we call the composition of 
with Tji and denote by the symbol 

which is read "T£ circle Tj(\ As illustrated in Figure 4.10.1, the transformation Tj^ in the formula is performed 
first; that is, 

(TBoTA)(y^) = TB(TA(x)) (1) 

This composition is itself a matrix transformation since 

which shows that it is multiplication by SA- This is expressed by the formula 

TboTa=Tba (2) 



WARNING 

Just as it is not true, in general, that 

AB^BA 
so it is not true, in general, that 

TboTa=TaoTb 
That is, order matters when matrix 
transformations are composed. 




Figure 4.10.1 



Compositions can be defined for any finite succession of matrix transformations whose domains and ranges have 



the appropriate dimensions. For example, to extend Formula 2 to three factors, consider the matrix 
transformations 

Tb-R^^rK Tc.r^^r"^ 

We define the composition (Tq o o Tj^R^ — ► R^ by 

As above, it can be shown that this is a matrix transformation whose standard matrix is CBA ^nd that 



As in Formula 9 of Section 4.9 , we can use square brackets to denote a matrix transformation without 
referencing a specific matrix. Thus, for example, the formula 

[7-2071] = [7-2] [Ti] 

is a restatement of Formula 2 which states that the standard matrix for a composition is the product of the 
standard matrices in the appropriate order. Similarly, 



[7'3oT2ori] = [r3][7'2][7'i] 

is a restatement of Formula 3. 

EXAMPLE 1 Composition of Two Rotations ^ 

Let T\R? R} ™d T2R^ _ ► J?^ be the matrix operators that rotate vectors through the angles 9\ 
and respectively. Thus the operation 

(72oTi)(x)=r2(Ti(x)) 

first rotates x through the angle 9\ , then rotates T\ (x) through the angle 92- follows that the net 
effect of T2 o Ti is to rotate each vector in through the angle 9\ + 92 (Figure 4.10.2). Thus, the 
standard matrices for these matrix operators are 



[7'i] = 



T2 



cos^2 — sintf2 
sini92 cos6?2 



72 0 71 



008(01+^2) -8111(^1+02) 

s\n{9\ +^2) cos(6»i +^2) 



These matrices should satisfy 4. With the help of some basic trigonometric identities, we can 
confirm that this is so as follows: 



[7*2] [Ti] = 



cos^2 —3111^2 cos^i — sin^i 

cosS2COS^l — smfl2sintfl — (cos^2si^l + siiitf2C0S^i) 
sintf2C0sfli + cos^2sintf 1 — sin^2sintf l + cos02cos^i 

cos(fli+fl2) -sm(fli+fl2) 
sm(fli+fl2) cos(fli+ff2) 
[7'2o7'i] 




Figure 4.10.2 



EXAMPLE 2 Composition Is Not Commutative ^ 

Let j?^ be the reflection about the line y = x, and let be the orthogonal 

projection on thej-axis. Figure 4.10.3 illustrates graphically that T\ o T2 and T2 o T\ have 
different effects on a vector x- This same conclusion can be reached by showing that the standard 
matrices for T\ and T2 do not commute: 




T.oT, 



T10T2 



Figure 4.10.3 



EXAMPLE 3 Composition of Two Reflections M 



Let fi'.R^ b} be the reflection about the j^-axis, and let ^ b} be the reflection about the 

X-axis. In this case T\ o T2 and T2 ' • T\ are the same; both map every vector x = (x, y) into its 
negative — x = ( — —7) (Figure 4.10.4): 

(T2oT^){x,y) =T2{-x,y) = {-x,-y) 

The equality of T\ o T2 and T2 o T\ can also be deduced by showing that the standard matrices for 
T\ and T2 commute: 



[T10T2] 






T2 




"-1 0' 

0 1_ 


'1 0" 
0 -1_ 




[T20TI] 




T2 






"1 0" 
0 -1 


"-1 0' 
0 1_ 





1 01 
0 -ij 

0 -ij 



The operator ^(x) = — x on /J^ or is called the reflection about the origin. As the foregoing 
computations show, the standard matrix for this operator on is 



T 




"-1 0' 






0 -1 




ri or. 



(X ->') ( x -V) 




r2(r,(x)) 




V 



Figure 4.10.4 



EXAMPLE 4 Composition of Three Transformations A 

Find the standard matrix for the operator T.R^ — ^ P? that first rotates a vector counterclockwise 
about the z-axis through an angle 9, then reflects the resulting vector about the >^-plane, and then 
projects that vector orthogonally onto the ^7 -plane. 

Solution The operator T can be expressed as the composition 



where Ti is the rotation about the z-axis, T2 is the reflection about the j^z-plane, and T2, is the 
orthogonal projection on the xy-plane. From Tables 6, 2, and 4 of Section 4.9 , the standard 
matrices for these operators are 

cos 0 —sin 0 0 
7*1 = sin cos 0 0 

[0 0 1 

Thus, it follows from 5 that the standard matrix for 7 is 



[T] = 









■-1 0 0' 








"1 0 0' 


r 






0 1 0 








0 1 0 








0 0 1 








0 0 0 





'1 0 0' 


■-1 


0 


0 


COS 0 


-sintf 0 




0 1 0 


0 


1 


0 


sin 9 


COS Q 0 




0 0 0 


0 


0 


1 


0 


0 1 




—COS B 


sxa.9 


0" 










sind 


cos Q 


0 










0 


0 


0 









One-to-One Matrix Transformations 

Our next objective is to establish a link between the invertibility of a matrix A and properties of the 
corresponding matrix transformation Tj^. 

r 

DEFINITION 1 

A matrix transformation Tj^.EP — ► R^^ is said to be one-to-one if Tj\ maps distinct vectors (points) in 
into distinct vectors (points) inR^. 

L 

(See Figure 4.10.5). This idea can be expressed in various ways. For example, you should be able to see that the 
following are just restatements of Definition 1: 

1. Tj^ is one-to-one if for each vector b in the range of A there is exactly one vector x in such that Tjgc = h. 

2. Tj\ is one-to-one if the equality ^^(u) = Tj[(y) implies that u = v- 



ir IT R" /r 

One-to-one Not one-to-one 

Figure 4.10.5 



Rotation operators on are one-to-one since distinct vectors that are rotated through the same angle have 
distinct images (Figure 4.10.6). In contrast, the orthogonal projection of on the xy-plane is not one-to-one 
because it maps distinct points on the same vertical line into the same point (Figure 4.10.7). 




Figure 4.10.6 Distinct vectors u and y are rotated into distinct vectors T(u) and ^(v) 



up 



Figure 4.10.7 The distinct points P and Q are mapped into the same point M 



The following theorem establishes a fundamental relationship between the invertibility of a matrix and properties 
of the corresponding matrix transformation. 



THEOREM 4.10.1 

If A is an « X « matrix and Tj^.R^ — ► BP is the corresponding matrix operator, then the following 
statements are equivalent. 

(a) A is invertible. 

(b) The range of is/?". 

(c) Tj[ is one-to-one. 

Proof We will establish the chain of implications (a) ^ {b) ^ (c) ^ {a). 

{a) => {b) Assume thatv4 is invertible. By parts {a) and {e) of Theorem 4.8.10, the system ^ = b is consistent 
for every « x 1 matrix b in This implies that Tj\ maps x into the arbitrary vector b in PP, which in turn 
implies that the range of Tj{ is all of/?". 

(i) ^ (c) Assume that the range of Tj\ is /?". This implies that for every vector b in /?" there is some vector x 
in /?" for which T^Cx) = b and hence that the linear system Ax = b is consistent for every vector b in But 
the equivalence of parts {e) and (J) of Theorem 4.8.10 implies that ^ = b has a unique solution for every vector 



b in and hence that for every vector b in the range of Tj\ there is exactly one vector x in /J" such that 

(c) ^ (a) Assume that Tj\ is one-to-one. Thus, if b is a vector in the range of Tj\, there is a unique vector x in 
for which Tj[(x) = b. We leave it for you to complete the proof using Exercise 30. 



EXAMPLE 5 Properties of a Rotation Operator M 

As indicated in Figure 4.10.6, the operator T:R^ that rotates vectors in R^ through an angle 

ff is one-to-one. Confirm that [T] is invertible in accordance with Theorem 4.10.1. 



Solution From Table 5 of Section 4.9 the standard matrix for T is 

cos^ — sin^ 



sin 9 cos 0 



This matrix is invertible because 



det 



cos^ — sinS 
sin^ cos ^ 



= cos^^+sin^e= 1 5t0 



EXAMPLE 6 Properties of a Projection Operator M 

As indicated in Figure 4.10.7, the operator T.R^ — ► R^ that projects each vector in R^ 
orthogonally on the xy-plane is not one-to-one. Confirm that [ T] is not invertible in accordance 
with Theorem 4.10.1. 

Solution From Table 4 of Section 4.9 the standard matrix for T is 







"1 0 0" 


T 




0 1 0 






0 0 0 



This matrix is not invertible since det[7'] = 0. 



Inverse of a One-to-One Matrix Operator 

If Tj(.R^ — ► is a one-to-one matrix operator, then it follows from Theorem 4.10.1 that ^ is invertible. The 
matrix operator 

that corresponds to ^4 ~^ is called the inverse operator or (more simply) the inverse of Tj\. This terminology is 
appropriate because Tj[ and ^ j-l cancel the effect of each other in the sense that if x is any vector in i?", then 



= /x = X 



or, equivalently, 



From a more geometric viewpoint, if w is the image of x under T^, then ^^-1 maps iv back into x, since 
(Figure 4.10.8). 




-► 



Figure 4.10.8 

Before considering examples, it will be helpful to touch on some notational matters. If T^: J?" — ^ is a 
one-to-one matrix operator, and if T^-\ :R^ —^R^is its inverse, then the standard matrices for these operators 
are related by the equation 



(6) 



In cases where it is preferable not to assign a name to the matrix, we will write this equation as 



(7) 



EXAMPLE 7 Standard Matrix for T ^ < 

Let 7': ^ j:^^ be the operator that rotates each vector in through the angle so from Table 5 
of Section 4.9 , 

cos^ — sin^ 

sm 0 cos ff J ^ 

It is evident geometrically that to undo the effect of 7, one must rotate each vector in through 
the angle ^0. But this is exactly what the operator does, since the standard matrix for is 

cos ( — ^) — sm (— fl) 
sin (-^) cos (-5) 

(verify), which is the standard matrix for a rotation through the angle — ff. 



T 



[7-1] = [7-]-^ = 



•1^ 


COS 0 sin d 






— sin^' cos^ 





EXAMPLE 8 Finding T 



-1 



Show that the operator T:R^ —^R^ defined by the equations 

wi =2x1 +X2 
W2 = 3^:1 +4x2 

is one-to-one, and find (wi, W2 J. 







W2 





2 

3 4 



Solution The matrix form of these equations is 
so the standard matrix for T is 

This matrix is invertible (so Tis one-to-one) and the standard matrix for is 

[7-1]= [7-]-! = 



T 




P '1 






.3 tj 



4 _i 

5 5 

1 2 

■5 5 



Thus 



4 _1 

5 5 

3 2 



W2 



3 2 



from which we conclude that 



Linearity Properties 

Up to now we have focused exclusively on matrix transformations from /?" to However, these are not the 
only kinds of transformations from to R^. For example, iff 1, /2. /m functions of the n 

variables xi, X2, - Xyi, then the equations 

wi =fl(.xi,X2 x„) 

W2 =/2(^1.^2.---,^m) 

=fm(xi,X2,...,Xyi) 

define a transformation T.R^ —* that maps the vector x = {x\,X2 Xy^ into the vector {vi>\,W2 yVf^i) . 

But it is only in the case where these equations are linear that T is a matrix transformation. The question that we 



will now consider is this: 

r 

Question 

Are there algebraic properties of a transformation T.BP — > that can be used to determine whether Tis 
a matrix transformation? 



The answer is provided by the following theorem. 



THEOREM 4.10.2 

T\R^ — * is a matrix transformation if and only if the following relationships hold for all vectors u 
and V in /?" and for every scalar k\ 
(i) + v) = rCCu) + Tiy)) [Additi^dt5^ property] 
7'(^) = ^^(u) [Homogeneity- propeity] 



Proof If r is a matrix transformation, then properties (i) and (ii) follow respectively from parts (c) and {b) of 
Theorem 4.9.1. 

Conversely, assume that properties (i) and (ii) hold. We must show that there exists an ^ x « matrix A such that 

for every vector x in R^. As a first step, recall from Formula (10) of Section 4.9 that the additivity and 
homogeneity properties imply that 

r(^lui+A:2U2+ • • • +^,u,)=itir(ui)+^27'(u2)+ " ' ' (9) 

for all scalars k\, k2, and all vectors uj, U2, in R^. Let A be the matrix 

^=[7'(ei)|7'(e2)|- • • \T{^n)] 

in which ei, 62, e„ are the standard basis vectors for R^. 

It follows from Theorem 1.3.1 that is a linear combination of the columns of A in which the successive 
coefficients are the entries x\, xj, of x- That is, 

^ = 7:17(61) +7:27(62) + • • • +x„7'(6„) 

Using 9 we can rewrite this as 

Afc = T{7i\e\ + 7:2^2 + " " " + ^m^m) = ^(jc) 

which completes the proof 



The additivity and homogeneity properties in Theorem 4.10.2 are called linearity conditions, and a 
transformation that satisfies these conditions is called a linear transformation. Using this terminology Theorem 



4.10.2 can be restated as follows. 



THEOREM 4.10.3 

Every linear transformation from ^ " to /?^Ms a matrix transformation, and conversely, every matrix 
transformation from to is a linear transformation. 



More on the Equivalence Theorem 

As our final result in this section, we will add parts (b) and (c) of Theorem 4.10.1 to Theorem 4.8.10. 

THEOREM 4.10.4 Equivalent Statements 

If ^ is an « X « matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is /„. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = h is consistent for every n x \ matrix b- 

(f) Ax = h has exactly one solution for every ^ x 1 matrix fc. 

(g) det(A)^0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 
(j) The column vectors of A span 

(k) The row vectors of A span 

(I) The column vectors of A form a basis for EP. 

(m) The row vectors of A form a basis for R^. 

(n) A has rank n. 

(o) ^ has nullity 0- 

(p) The orthogonal complement of the null space of ^ is 

(q) The orthogonal complement of the row space of ^ is { 0 } . 

(r) The range of Tj\ is R^. 

(s) Tj\ is one-to-one. 



Concept Review 

• Composition of matrix transformations 

• Reflection about the origin 

• One-to-one transformation 

• Inverse of a matrix operator 

• Linearity conditions 

• Linear transformation 

• Equivalent characterizations of invertible matrices 

Skills 

• Find the standard matrix for a composition of matrix transformations. 

• Determine whether a matrix operator is one-to-one; if it is, then find the inverse operator. 

• Determine whether a transformation is a Hnear transformation. 



Exercise Set 4,10 

In Exercises 1-2, let Tj\ and T£ be the operators whose standard matrices are given. Find the standard matrices 
for and Tj^o T£ . 



1. 


"1 


-2 


0" 




'2 


-3 


3" 


A = 


A 


1 


-3 


. B = 


5 


0 


1 




5 


2 


4 




6 


1 


7 



Answer: 









" 5 




■1 21" 






"-8 


-3 1 




Ta 




10 




-8 4 




Tb = 


-5 


-15 -8 








45 




3 25_ 






44 


-11 45 




S 


3 - 


•r 




4 


0 4" 






A = 


2 


0 


1 


. B = 


-1 


5 2 








4 


-3 


6 




2 - 


-3 8 







3. Let 71(7:1,7:2) = (7:1 +X2,xi -7:2) and 72(7:1, 7:2) = (37ri, 27:i +4^:2) . 

(a) Find the standard matrices for Ti and 7'2. 

(b) Find the standard matrices for T2 o Ti and T\oT2 ^ 

(c) Use the matrices obtained in part (b) to find formulas for T\ (7*2 (7:1, 7:2)) and 7*2 (Ti (7:1, 7:2)) . 



Answer: 



(a) 
(b) 

(c) ^2(^1 (Jfl. xi)) = (3x1 + 3:^2. 6x1 - 2x2), 



ri(r2(xi. X2)) = (5xi+4x2. XI -4x2) 

4. Letri(xi,X2.X3) = (4x1. -2xi+X2, -xi - 3x2) and 7'2(xi, X2, X3) = (xi + 2x2, -X3,4xi-X3). 

(a) Find the standard matrices for 7*1 and 7*2 • 

(b) Find the standard matrices for 7*2 o 7*1 and 7*1 o 7*2 • 

(c) Use the matrices obtained in part (b) to find formulas for 7*1 (7*2 (xi, X2, X3)) andT2(Ti(xi, X2, X3)). 

5. Find the standard matrix for the stated composition in 5;-. 

(a) A rotation of 90°, followed by a reflection about the line y = x. 

(b) An orthogonal projection on the j-axis, followed by a contraction with factor k = ^. 

(c) A reflection about the x-axis, followed by a dilation with factor fc = 3- 

Answer: 

(a) 



(b) 



(c) 



0 0 

' ^ 

3 0 
0 -3 



6. Find the standard matrix for the stated composition in 

(a) A rotation of 60°, followed by an orthogonal projection on the x-axis, followed by a reflection about the 
liney 

(b) A dilation with factor = 2? followed by a rotation of 45°, followed by a reflection about the j;-axis. 

(c) A rotation of 15°, followed by a rotation of 105°, followed by a rotation of 60°. 

7. Find the standard matrix for the stated composition in f^-'. 

(a) A reflection about the jz-plane, followed by an orthogonal projection on the xz-plane. 

(b) A rotation of 45° about the j^-axis, followed by a dilation with factor k = 

(c) An orthogonal projection on the xj^-plane, followed by a reflection about the 3;z-plane. 



Answer 

(a) 



1 0 0 
0 0 0 
0 0 1 



(b) 



(c) 



1 0 1 
0/20 
-1 0 1 

-1 0 0 

0 1 0 

0 0 0 



8. Find the standard matrix for the stated composition in /J^. 

(a) A rotation of 30° about the x-axis, followed by a rotation of 30° about the z-axis, followed by a 

contraction with factor fc = -7- 

4 

(b) A reflection about the xy-plane, followed by a reflection about the xz-plane, followed by an orthogonal 
projection on the j;z-plane. 

(c) A rotation of 270° about the x-axis, followed by a rotation of 90° about the j^-axis, followed by a rotation 
of 180° about the z-axis. 

9. Determine whether Ti o 7*2 = 7*2 o Ti . 

(a) T\ \F^ —¥ is the orthogonal projection on the x-axis, and 7*2: J?^ is the orthogonal projection on 
the j-axis. 

(b) Ti '.R^ R? is the rotation through an angle 9\ , and f2\R?—^ R^ is the rotation through an angle 02- 

(c) T\ :R^ R^ is the orthogonal projection on the x-axis, and 7*2: J?^ R^ is the rotation through an angle 



Answer: 

(a) ^1 T2 = T2o Ti 

(b) o 72 = 72 o Ti 

(c) 710 72 5^72 0 71 

10. Determine whether Ti o 72 = 72 o 7i . 

(a) 7i : J?^ — ► J?^ is a dilation by a factor k, and 72: J?^ — ► J?** is the rotation about the z-axis through an angle 
9 

(b) 7i : J?^ — ► R^ is the rotation about the x-axis through an angle Bi, and 72: J?"^ — ► R'^ is the rotation about 
the z-axis through an angle 82- 

11. By inspection, determine whether the matrix operator is one-to-one. 

(a) the orthogonal projection on the x-axis in 

(b) the reflection about the j;-axis in 

(c) the reflection about the line y = x in /j2 

(d) a contraction with factor fc > 0 in fi^ 

(e) a rotation about the z-axis in /^^ 

(f) a reflection about the xj-plane in /J^ 

(g) a dilation with factor fc > Q in /J^ 



Answer: 

(a) Not one-to-one 

(b) One-to-one 

(c) One-to-one 

(d) One-to-one 

(e) One-to-one 

(f) One-to-one 

(g) One-to-one 

12. Find the standard matrix for the matrix operator defined by the equations, and use Theorem 4.10.4 to 
determine whether the operator is one-to-one. 

(a) wi = 8jr 1+4^:2 
W2 = 2x\+ X2 

(b) wi =2:^1 -37:2 

(c) = -XI + 3x2 + 27:3 
W2 = 2x1 -f 47:3 
W3 = x\ \ 3x2 + 6x3 

(d) - A"! t 2x2 \ 3:^3 
W2 = 2j:i + 5^2 + 37:3 
W3= XI +Sx2 

13. Determine whether the matrix operator T:R^ ^ defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find \ w\,W2\ 



(a) wi = :^1 + 27:2 
W2= + X2 

(b) y^i = -6x2 

W2= -2x1+3^:2 

(c) wi=-^2 

(d) = 3x1 
W2 = -5x1 

Answer: 



(a) 



One-to-one; 



i 
3 
i 
3 



3 




(b) Not one-to-one 




(d) Not one-to-one 

14. Determine whether the matrix operator T\P^ — ► J?^ defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find W2, W3 J. 

(a) >*^1 = ^1 -2^:2 + 2x3 

(b) wi = x\ ^3x2^ Ax2 
W2= -^1 + :t2+ ^3 

vi?3 = — 2x2 + 5:^3 

(c) vvi = ^1 f- 4x2 -:^3 
W2 = 27:i +7:^2 + ^3 
vi?3 = XI + 3x2 

(d) vi^l = XI +2x2+ ^3 
W2= - 2^1 i X2 I 4;v3 
W3 = 7x1 +4x2 — 5x3 

15. By inspection, find the inverse of the given one-to-one matrix operator. 

(a) The reflection about the x-axis 'mp}. 

(b) The rotation through an angle of - / 4 in /J^. 

(c) The dilation by a factor of 3 in /J ^. 

(d) The reflection about the 3;z-plane in f^. 
(g) The contraction by a factor of ^ in /J^. 

Answer: 

(a) Reflection about the x-axis 

(b) Rotation through the angle — ^ 

(c) Contraction by a factor of y 

(d) Reflection about the j^z-plane 

(e) Dilation by a factor of 5 

In Exercises 16 — 17, use Theorem 4.10.2 to determine whether T.B} — ► J?^ is a matrix operator. 



'•(a) Tix,y) 




(c) T{x,y) 

(d) T{x.y) 



(a) T{x,y) 

(b) TiK.y) 



i:2x+y,x-y) 



(c) T{x,y) = iy,y) 



Answer: 

(a) Matrix operator 

(b) Not a matrix operator 

(c) Matrix operator 

(d) Not a matrix operator 

In Exercises 18-19, use Theorem 4.10.2 to determine whether T:S? —^S? is a matrix transformation. 

18- (a) T(x,y,z) = ix,x+y+z) 
(b) Tix^y^z) = ih 1) 

19- (a) T(x^y,z) = (0,0) 

(b) nx,y.z) = (3x^4y,2x^5z) 

Answer: 

(a) Matrix transformation 

(b) Matrix transformation 

20. In each part, use Theorem 4.10.3 to find the standard matrix for the matrix operator from the images of the 
standard basis vectors. 

(a) The reflection operators on /J^ in Table 1 of Section 4.9 . 

(b) The reflection operators on in Table 2 of Section 4.9 . 

(c) The projection operators on in Table 3 of Section 4.9 . 

(d) The projection operators on in Table 4 of Section 4.9 . 

(e) The rotation operators on in Table 5 of Section 4.9 . 

(f) The dilation and contraction operators on in Table 8 of Section 4.9 . 

21. Find the standard matrix for the given matrix operator. 

(a) T'.B? F? projects a vector orthogonally onto the x-axis and then reflects that vector about the j-axis. 

(b) T'.F? ' P? reflects a vector about the line y = x and then reflects that vector about the x-axis. 

(c) T.R'^ ■ dilates a vector by a factor of 3, then reflects that vector about the line y = x, and then 
projects that vector orthogonally onto the j^-axis. 



Answer 

(a) 



-1 0 
0 0 



(b) 
(c) 



[-;:] 

I'l] 



22. Find the standard matrix for the given matrix operator. 

(^) T:R^ — ► reflects a vector about the xz-plane and then contracts that vector by a factor of ^. 

(b) T:R^ — ► projects a vector orthogonally onto the xz-plane and then projects that vector orthogonally 
onto the xy-plane. 

(c) T:R^ — ► reflects a vector about the xy-plane, then reflects that vector about the xz-plane, and then 
reflects that vector about the >'z-plane. 

23. Let Tj(,R^ — ► be multiplication by 

"-1 3 0" 



i4 = 



2 1 2 
4 5-3 



and let ei, 62, and 63 be the standard basis vectors for g}. Find the following vectors by inspection. 

(a) T>i(ei), 7-^(62), and r^(^3) 

(b) '^a(^\ +^2+^3) 

(c) TAOes) 



Answer: 



(a) TAei) = (-1.2. 4). 7^(82) = (3. 1. 5). 7^(93) = (0. 2. - 3) 

(b) r^(ei + e2 + e3) = (2,5.6) 

(c) r^(7e3) = (0. 14. -21) 

24. Determine whether multiplication by ^ is a one-to-one matrix transformation. 



(a) 

A = 

(c) 

A = 



1 -1 

2 0 

3 -4 

1 2 3 
-1 0 -4 

1 2 1 

0 1 1 

1 1 0 
1 0 -1 



(a) Is a composition of one-to-one matrix transformations one-to-one? Justify your conclusion. 

(b) Can the composition of a one-to-one matrix transformation and a matrix transformation that is not 
one-to-one be one-to-one? Account for both possible orders of composition and justify your conclusion. 



Answer: 



(a) Yes 

(b) Yes 



26. Show that T{x, y) = (0, 0) defines a matrix operator on but T{x, ^) = (1, 1) does not. 

(a) Prove: If T: is a matrix transformation, then ^(O) = 0; that is, Tmaps the zero vector in 
into the zero vector inR^. 

(b) The converse of this is not true. Find an example of a function that satisfies ^(O) = 0 but is not a matrix 
transformation. 



28. Prove: An « x « matrix A is invertible if and only if the linear system ^ = w has exactly one solution for 
every vector w in PJ^ for which the system is consistent. 

29. Let ^ be an ^ X matrix such that det(-4) = 0, and let T: R^ — ► R^ be multiplication by A. 

(a) What can you say about the range of the matrix T? Give an example that illustrates your conclusion. 

(b) What can you say about the number of vectors that Tmaps into Q? 

Answer: 

(a) The range of T is a proper subset of 

(b) T must map infinitely many vectors to 0. 

30. Prove: If the matrix transformation Tj^.R^ — ► R^ is one-to-one, then A is invertible. 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer, 
(a) If T.R^ —rR^^ and 7(0) = 0, then 7 is a matrix transformation. 



(b) lfT\R^ and T{c\y. \ C2y) =ciT(x) +C2T(y)foY all scalars and C2 and all vectors x and yinR^ 
, then r is a matrix transformation. 

Answer: 

True 

(c) If T:R^ — ► R^ is a one-to-one matrix transformation, then there are no distinct vectors x and y for which 



Answer: 




Answer: 



False 



7'(x-^) = 0. 



Answer: 



True 



(d) If T:/?" — ► iZ'" is a matrix transformation and m>n^ then Tis one-to-one. 
Answer: 

False 

(e) If T.B!^ — ► R!^ is a matrix transformation and = then 7 is one-to-one. 
Answer: 

False 

(f) If T:R^ — ► is a matrix transformation and m<n^ then T is one-to-one. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



4.11 Geometry of Matrix Operators on 

In this optional section we will discuss matrix operators on in a little more depth. The ideas that we will develop here 
have important applications to computer graphics. 



Transformations of Regions 

In Section 4.9 we focused on the effect that a matrix operator has on individual vectors in and R^. However, it is also 
important to understand how such operators affect the shapes of regions. For example, Figure 4.11.1 shows a famous 
picture of Albert Einstein and three computer-generated modifications of that image that result from matrix operators on 

The original picture was scanned and then digitized to decompose it into a rectangular array of pixels. The pixels 
were then transformed as follows: 

• The program MATLAB was used to assign coordinates and a gray level to each pixel. 

• The coordinates of the pixels were transformed by matrix multiplication. 

• The pixels were then assigned their original gray levels to produce the transformed picture. 






nzontally 



Compressed horizc 



Figure 4.11.1 



The overall effect of a matrix operator on can often be ascertained by graphing the images of the vertices 
(0, 0), (1, 0), (0, 1), and (1, 1) of the unit square (Figure 4.11.2). Table 1 shows the effect that some of the matrix 
operators studied in Section 4.9 have on the unit square. For clarity, we have shaded a portion of the original square and 
its corresponding image. 




Unit square 





Unit square rotated 



Unit square reflected 
about the y-axis 



t: 



.X 



Unit square reflected 
about the line v = a 



Unit square 
onto the x-i 



Figure 4.11.2 



Table 1 



Optrator 



Standard Matrix 



ElTect on tbc Lnit Squart* 



Reflection about 
the v-axis 



[-:;] 



(M) 



H.l> 



X 



Reflection about 
the ,r-axis 



[; -•;] 



(1. 1) 



X 




a.-ii 



Reflection about 
the line y = ,r 



[;:] 



71 . 



"I 



Counterc lockwi se 
rotation tlirough 
an angle 0 



[cos 0 -sin 0 1 
sin 0 cos Bj 



Compression in the 
jc-direction by a 
factor of A' 

(0<it< 1) 



[: ;] 



r 



{k. 1) 



Expansion in the 
-v-direction by a 
factor of k 

(k>\) 



[:;] 



0,1) 



(A:, I) 



Shear in tlie 
,r-direction with 
factor A' >0 



[:;] 



(1,1) 



{x + ky\y) 




Shear in tlie 



T iui) 



EXAMPLE 1 Transforming with Diagonal Matrices M 



Suppose that the xy-plane first is compressed or expanded by a factor ofki in the x-direction and then is 
compressed or expanded by a factor of k2 in the j^-direction. Find a single matrix operator that performs 
both operations. 



Solution The standard matrices for the two operations are 

>1 0" 
0 1 



1 0 
0 k2 



X- compression (expansion) compression (expansion) 

Thus, the standard matrix for the composition of the x-operation followed by the j-operation is 

A = 



"1 0 ■ 


'ki 0" 




'ki 0 " 


0 k2_ 






0 k2 



(1) 



This shows that multiplication by a diagonal 2x2 matrix compresses or expands the plane in the 
x-direction and also in the y-direction. In the special case where ki and k2 are the same, say ki=k2 = k, 
Formula 1 simplifies to 

~k 0" 



A = 



0 k 



which is a contraction or a dilation (Table 7 of Section 4.9 ). 



EXAMPLE 2 Finding IVIatrix Operators M 

(a) Find the standard matrix for the operator on that first shears by a factor of 2 in the x-direction and 
then reflects the result about the line y = x. Sketch the image of the unit square under this operator. 

(b) Find the standard matrix for the operator on that first reflects about y = x and then shears by a 
factor of 2 in the x-direction. Sketch the image of the unit square under this operator. 

(c) Confirm that the shear and the reflection in parts (a) and (b) do not commute. 



Solution 

(a) The standard matrix for the shear is 



and for the reflection is 



Ai = 



A2 = 



1 2 

0 1 

0 1 

1 0 



Thus, the standard matrix for the shear followed by the reflection is 

A2Ai = 



"0 r 


"1 2 




'0 r 


_1 0_ 


0 1_ 




_1 2_ 



(b) The standard matrix for the reflection followed by the shear is 

AxA2 = 



1 

1 


"0 r 




"2 r 


0 1_ 


1 0_ 




_1 0_ 



(c) The computations in Solutions {a) and {b) show that ^^^2 ^2^1 ' standard matrices, and 

hence the operators, do not commute. The same conclusion follows from Figures 4.11.3 and 4.11.4, 
since the two operators produce different images of the unit square. 



(l>l) 
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v-dlrection 
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Figure 4.11.3 



(M) 





Shear in the 
x-direction 
with it = 2 



Reflection 
about y-x 



Figure 4.11.4 



Geometry of One-to-One Matrix Operators 

We will now turn our attention to one-to-one matrix operators on which are important because they map distinct 
points into distinct points. Recall from Theorem 4.10.4 (the Equivalence Theorem) that a matrix transformation Tj\ is 
one-to-one if and only if A can be expressed as a product of elementary matrices. Thus, we can analyze the effect of any 
one-to-one transformation by first factoring the matrix A into a product of elementary matrices, say 

and then expressing Tj\ as the composition 



(2) 



The following theorem explains the geometric effect of matrix operators corresponding to elementary matrices. 



Li: 



THEOREM 4.11.1 

If E is an elementary matrtix, then Tg.E? B? the following: 

(a) A shear along a coordinate axis. 

(b) A reflection about y = x. 

(c) A compression along a coordinate axis. 

(d) An expansion along a coordinate axis. 

(e) A reflection about a coordinate axis. 

(59 A compression or expansion along a coordinate axis followed by a reflection about a coordinate axis. 



Proof Because a 2 x 2 elementary matrix results from performing a single elementary row operation on the 2 x 2 
identity matrix, such a matrix must have one of the following forms (verify): 



1 0 




1 k 




0 1 




k 0 




1 0 


k 1 




0 1 




1 0 


7 


0 1 




0 k 



The first two matrices represent shears along coordinate axes, and the third represents a reflection about y = If > 0, 
the last two matrices represent compressions or expansions along coordinate axes, depending on whether 0 < ^ < 1 or 
fc > 1 • If fc < 0? if we express k in the form k= —ki, where k\ > 0, then the last two matrices can be written as 



'k 0' 




'-ki 0' 




"-1 0" 


>i 0" 


0 1_ 




0 1 




0 1_ 


_ 0 1_ 



"1 0" 




"l 0 




'1 0" 


"1 0 ■ 


0 k 




0 -ki 




0 -1_ 


0 ^1 



Since ki > 0, the product in 3 represents a compression or expansion along the x-axis followed by a reflection about the 
j-axis, and 4 represents a compression or expansion along they-axis followed by a reflection about the x-axis. In the 
case where ,t = — 1 ? transformations 3 and 4 are simply reflections about the j;-axis and x-axis, respectively. 



Since every invertible matrix is a product of elementary matrices, the following result follows from Theorem 4.11.1 and 
Formula 2. 

THEOREM 4.11.2 

If 7'^ pp" . J?^ is multiplication by an invertible matrix A, then the geometric effect of Tj\ is the same as an 
appropriate succession of shears, compressions, expansions, and reflections. 

r n 

EXAMPLE 3 Analyzing the Geometric Effect of a Matrix Operator M 



Assuming that ki and k2 are positive, express the diagonal matrix 



A = 



kx 0" 

0 ^2 



as a product of elementary matrices, and describe the geometric effect of multiplication by A in terms of 
compressions and expansions. 



Solution From Example 1 we have 

A = 



which shows that multiplication by A has the geometric effect of compressing or expanding by a factor of 
ki in the x-direction and then compressing or expanding by a factor of k2 in the j-direction. 



>1 


0 " 




"1 


0 ■ 


'ki 0' 


0 


k2_ 




0 


k2_ 


0 1_ 



EXAMPLE 4 Analyzing the Geometric Effect of a IVIatrix Operator M 

Express 



A = 



1 2 
3 4 



as a product of elementary matrices, and then describe the geometric effect of multiplication by A in terms 
of shears, compressions, expansions, and reflections. 



Solution A can be reduced to / as follows: 



T 

Add— 3 times 
the first row 
to the second. 



1 2 
0 -2 



T 

Multiply the 
second row 

by 4. 



T 

Add —2 times 
the second row 
to the first. 



1 0 
0 1 



The three successive row operations can be performed by multiplying A on the left successively by 

1 0" 



1 0 
-3 1 



. S2 = 



1 



0 

Inverting these matrices and using Formula 4 of Section 1 .5 yields 

A — ^ E2 ^ E2 ^ — 
Reading from right to left and noting that 



1 -2 

0 1 



"1 0" 


"1 0" 


"1 2" 


3 1_ 


0 -2_ 


0 1_ 



"1 0" 




'1 0" 


'1 0' 


0 -2_ 




0 -1_ 


0 2_ 



it follows that the effect of multiplying by A is equivalent to 

1. shearing by a factor of 2 in the x-direction, 

2. then expanding by a factor of 2 in the j-direction, 

3. then reflecting about the x-axis, 

4. then shearing by a factor of 3 in the j^-direction. 



Images of Lines Under Matrix Operators 



Many images in computer graphics are constructed by connecting points with Hne segments. The following theorem, 
some of whose parts are proved in the exercises, is helpful for understanding how matrix operators transform such 
figures. 



THEOREM 4.11.3 

If T'.B? — ► is multiplication by an invertible matrix, then: 

(a) The image of a straight line is a straight line. 

(b) The image of a straight line through the origin is a straight line through the origin. 

(c) The images of parallel straight lines are parallel straight lines. 

(d) The image of the line segment joining points P and Q is the line segment joining the images of P and Q. 

(e) The images of three points lie on a line if and only if the points themselves lie on a line. 



Note that it follows from Theorem 4.11.3 that if A is 
an invertible 2x2 matrix, then multiplication by A 
maps triangles into triangles and parallelograms into 
parallelograms. 



EXAMPLES Image of a Square A 

Sketch the image of the square with vertices (0, 0), (1, 1), and (0, 1) under multiplication by 




Solution Since 



"-1 2" 


"o" 




"o" 




"-1 2" 


"l" 




"-1" 


2 -1_ 


0 




0 




2 -1_ 


_0_ 




2_ 



'-1 2' 


"o" 




2' 




"-1 2' 


"r 




"r 


2 -1_ 


_1_ 








2 -1_ 


_1_ 




1 



the image of the square is a parallelogram with vertices (0, 0), ( — 1, 2), (2, — 1), and (1,1) (Figure 
4.11.5). 



(0.1) 
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(2.-I) 



Figure 4.11.5 



EXAMPLES Image of a Line M 

According to Theorem 4.11.3, the invertible matrix 

A = 



3 1 

2 1 



maps the line y = 2x -f 1 into another line. Find its equation. 

Solution Let (x,y) be a point on the line y = 2x \, and let ' ^ ^ ' ) be its image under 
multiplication hy A. Then 







"3 r 




and 






"3 r 


-1 


'x ' ' 




1 -1" 


X ' 






2 1_ 


y 


y 




2 1_ 








-2 3_ 





so 

x= x' - 7 ' 
= -2x ' + S^? ' 

Substituting in y = 2;c + 1 yields 

~2x ' + 37 ' = 2^;t ' -7 ' j 1 1 or equivalently^ ' = ' + j 

Thus (;r ' , 7 ' ) satisfies 



4 1 



which is the equation we want. 



Concept Review 

• Effect of a matrix operator on the unit square 

• Geometry of one-to-one matrix operators 

• Images of lines under matrix operators 

Skills 

• Find standard matrices for geometric transformations of 

• Describe the geometric effect of an invertible matrix operator. 

• Find the image of the unit square under a matrix operator. 

• Find the image of a hne under a matrix operator. 



Exercise Set 4.11 

1. Find the standard matrix for the operator T.F? that maps a point (x, y) into 

(a) its reflection about the hne y — —x. 

(b) its reflection through the origin. 

(c) its orthogonal projection on the x-axis. 

(d) its orthogonal projection on the j-axis. 



Answer: 



(a) 
(b) 
(c) 
(d) 



0 -1 

-1 0 

-1 0 
0 -1 

1 0 
0 0 

0 0 
0 1 



2. For each part of Exercise 1, use the matrix you have obtained to compute 7'(2, 1). Check your answers geometrically 
by plotting the points (2, 1) and7'(2, 1). 

3. Find the standard matrix for the operator J" ^ ^ that maps a point (p^^y^z) into 

(a) its reflection through the xj-plane. 

(b) its reflection through the xz-plane. 

(c) its reflection through thejz-plane. 

Answer: 



(a) 



1 0 0 
0 1 0 
0 0-1 



(b) 



(c) 



1 0 
0 -1 
0 0 

-1 0 
0 1 
0 0 



4. For each part of Exercise 3, use the matrix you have obtained to compute T(\. 1, 1). Check your answers 
geometrically by plotting the points (1, 1, 1) andT(l, 1, 1). 

5. Find the standard matrix for the operator T:R^ — » that 

(a) rotates each vector 90° counterclockwise about the z-axis (looking along the positive z-axis toward the origin). 

(b) rotates each vector 90° counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 

(c) rotates each vector 90° counterclockwise about the j'-axis (looking along the positive y-axis toward the origin). 



Answer: 



(a) 



(b) 



(c) 



-1 0 
0 0 
0 1 

0 0 

0 -1 

1 0 

0 0 1 
0 1 0 
-10 0 



6. Sketch the image of the rectangle with vertices (0, 0), (1, 0), (1, 2), and (0, 2) under 

(a) a reflection about the x-axis. 

(b) a reflection about the y-axis. 

(c) a compression of factor it = ^ in the y-direction. 

(d) an expansion of factor i = 2 in the x-direction. 

(e) a shear of factor ^ = 3 in the x-direction. 

(f) a shear of factor ^ = 2 in the y-direction. 

7. Sketch the image of the square with vertices (0, 0), (1, 0), (0, 1), and (1,1) under multiphcation by 



■=[1 ?] 



Answer: 



Rectangle with vertices at (0, 0), (-3, 0), (0, 1), (-3, 1) 

8. Find the matrix that rotates a point (x, y) about the origin 

(a) 45° 

(b) 90° 

(c) 180° 

(d) 270° 



(e) -30° 

9. Find the matrix that shears by 

(a) a factor of fc = 4 iii the ^-direction. 

(b) a factor of fc = — 2 iii the x-direction. 



Answer; 
(b) 



10. Find the matrix that compresses or expands by 
(^) a factor of -j in the j-direction. 

(b) a factor of 6 in the x-direction. 

11. In each part, describe the geometric effect of multiplication by ^. 



Answer: 

(a) Expansion by a factor of 3 in the x-direction 

(b) Expansion by a factor of 5 in the j-direction and reflection about the x-axis 

(c) Shearing by a factor of 4 in the x-direction 

12. In each part, express the matrix as a product of elementary matrices, and then describe the effect of multiplication by 
A in terms of compressions, expansions, reflections, and shears. 



(b)^ 

(d)^ 



-ill] 

=[2 1] 
=[i 1] 



13. In each part, find a single matrix that performs the indicated succession of operations. 

(a) Compresses by a factor of ^ in the x-direction, then expands by a factor of 5 in the j-direction. 

(b) Expands by a factor of 5 in the j-direction, then shears by a factor of 2 in the j^-direction. 

(c) Reflects about y =x, then rotates through an angle of 180° about the origin. 



Answer: 



(a) 

(b) 
(c) 



0 5 



14. In each part, find a single matrix that performs the indicated succession of operations. 

(a) Reflects about the j^-axis, then expands by a factor of 5 in the x-direction, and then reflects about y=x. 

(b) Rotates through 30° about the origin, then shears by a factor of —2 in the jF-direction, and then expands by a 
factor of 3 in the j-direction. 

15. Use matrix inversion to show the following. 

(a) The inverse transformation for a reflection about y = x is a reflection about y = x. 

(b) The inverse transformation for a compression along an axis is an expansion along that axis. 

(c) The inverse transformation for a reflection about a coordinate axis is a reflection about that axis. 

(d) The inverse transformation for a shear along a coordinate axis is a shear along that axis. 

16. Find an equation of the image of the line y = _ 4^ + 3 under multiplication by 

"4 -3" 



3 "2 



17. In parts (a) through (e), find an equation of the image of the line y = 2x under 

(a) a shear of factor 3 in the x-direction. 

(b) a compression of factor -i in the j^-direction. 

(c) a reflection about y=x. 

(d) a reflection about the j-axis. 

(e) a rotation of 60° about the origin. 

Answer: 

(a) y = |x 

(b) y=' 

(d) y= -2x 

(e) / S + 5^ \ 

18. Find the matrix for a shear in the x-direction that transforms the triangle with vertices (0, 0), (2, 1), and (3, 0) into 
a right triangle with the right angle at the origin. 



(a) Show that multiplication by 



maps each point in the plane onto the line y = 2X' 



(b) It follows from part (a) that the noncollinear points (1,0), (0, 1), (—1,0) are mapped onto a line. Does this 
violate part (e) of Theorem 4.11.3? 



Answer: 

(b) No 

20. Prove part (a) of Theorem 4.11.3. [Hint: A line in the plane has an equation of the form Ax +5y + C = 0, where A 
and B are not both zero. Use the method of Example 6 to show that the image of this line under multiplication by the 
invertible matrix 

b~ 

has the equation ^';5:-H5'7 + C=0? where 

A' ={dA-cB)l {ad-bc) 

and 

5' = {^bA^aB)l{ad^bc) 

Then show that A ' B ' ^re not both zero to conclude that the image is a line.] 

21. Use the hint in Exercise 20 to prove parts {b) and (c) of Theorem 4.11.3. 

22. In each part of the accompanying figure, fmd the standard matrix for the operator described. 
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Figure Ex-22 

23. In the shear in the xy-direction with factor k is the matrix transformation that moves each point (x, y, z) parallel 
to the xy-plane to the new position (7; ) fe^ y f fe^ z) • (See the accompanying figure.) 

(a) Find the standard matrix for the shear in the xy-direction with factor k. 

(b) How would you define the shear in the xz-direction with factor k and the shear in the jz-direction with factor kl 
Find the standard matrices for these matrix transformations. 
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Figure Ex-23 



Answer: 



(a) 



1 0 k 

0 1 k 

0 0 1 



(b) Shear in the xz-direction with 



factor k maps (x, y, z) to {x -I- ky, y,z^ ky) - 



1 k 0 
0 1 0 
0 k 1 



Shear in the j^z-direction with factor k maps (x, y, z)to (x^y \ kx, z =H kx) • 

True-False Exercises 



1 0 0 
k 1 0 
k 0 1 



In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The image of the unit square under a one-to-one matrix operator is a square. 
Answer: 

False 

(b) A 2 X 2 invertible matrix operator has the geometric effect of a succession of shears, compressions, expansions, and 
reflections. 

Answer: 

True 

(c) The image of a line under a one-to-one matrix operator is a line. 
Answer: 

True 

(d) Every reflection operator on is its own inverse. 
Answer: 

True 



(e) 



The matrix 



1 1 

1 -1 



represents reflection about a line. 



Answer: 

False 

The matrix 



1 -2 

2 1 



represents a shear. 



Answer: 



False 

(g) 



[III 



The matrix represents an expansion. 



Answer: 

True 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



4.12 Dynamical Systems and Markov Chains 



In this optional section we will show how matrix methods can be used to analyze the behavior of physical systems that 
evolve over time. The methods that we will study here have been applied to problems in business, ecology, 
demographics, sociology, and most of the physical sciences. 



A dynamical system is a finite set of variables whose values change with time. The value of a variable at a point in time 
is called the state of the variable at that time, and the vector formed from these states is called the state of the 
dynamical system at that time. Our primary objective in this section is to analyze how the state of a dynamical system 
changes with time. Let us begin with an example. 

EXAMPLE 1 Market Share as a Dynamical System A 

Suppose that two competing television channels, channel 1 and channel 2, each have 50% of the viewer 
market at some initial point in time. Assume that over each one-year period channel 1 captures 10% of 
channel 2's share, and channel 2 captures 20% of channel I's share (see Figure 4.12.1). What is each 
channel's market share after one year? 



Dynamical Systems 




80% 



90% 



Channel I loses 20% and 
holds 80%. 

Channel 2 loses 10% and 
holds 90%. 



Figure 4.12.1 



Solution Let us begin by introducing the time-dependent variables 

xi(t) = fraction of the market held by channel 1 at time t 
X2(t) = fraction of the market held by channel 2 at time t 



and the column vector 



^2(0 



^ Channel 1 s fraction of the market at time t in years 
^ Channel 2^s fraction of the maiket at time t in yesi s 



The variables x i (t) and X2(t) form a dynamical system whose state at time t is the vector x (t) . If we 
take ^ = 0 to be the starting point at which the two channels had 50% of the market, then the state of the 
system at that time is 



^1 (0) _ 0.5 ^ Channel I's fraction of the market at time^ = 0 
^2(0) J ^ Channel 2 $ fraction of tlie market at time ^ = 0 



(1) 



Now let us try to fmd the state of the system at time i = \ (one year later). Over the one-year period, 
channel 1 retains 80% of its initial 50%), and it gains 10%) of channel 2's initial 50%. Thus, 



XI (1) = 0.8(0.5) + 0. 1 (0.5) = 0.45 



(2) 



Similarly, channel 2 gains 20% of channel I's initial 50%, and retains 90%) of its initial 50%. Thus, 

X2(l) = 0.2(0.5) -{- 0.9(0.5) = 0.55 (3) 



Therefore, the state of the system at time ^ = 1 is 



r 1 



^2(1) 



0.45 
0.55 



Channel I' s fraction of the market at time ^ = 1 
Channel I's fraction of the market at time / = 1 



(4) 



EXAMPLE 2 Evolution of Market Share over Five Years M 

Track the market shares of channels 1 and 2 in Example 1 over a five-year period. 

Solution To solve this problem suppose that we have already computed the market share of each 
channel at time i = k and we are interested in using the known values of x \ (k) and X2{k) to compute the 
market shares x\(k \ 1 ) and X2(k. I 1 ) one year later. The analysis is exactly the same as that used to 
obtain Equations 2 and 3. Over the one-year period, channel 1 retains 80%) of its starting fraction t: ^ (^) 
and gains 10%) of channel 2's starting fraction X2{k). Thus, 



x^ik^ 1) = (0.8);ri(yt) ^ (0.1);c2(yt) 



(5) 



Similarly, channel 2 gains 20%) of channel I's starting fraction t: i (^) and retains 90%) of its own starting 
fraction :^ 2 (^) • Thus, 



X2{k^ 1) = (0.2)xi(^) + (0.9)X2(^) 
Equations 5 and 6 can be expressed in matrix form as 



"xi(/t+l)" 




"0.8 o.r 








0.2 0.9_ 





(6) 



(7) 

which provides a way of using matrix multiplication to compute the state of the system at time ^ = jt + 1 
from the state at time i = ^. For example, using 1 and 7 we obtain 

x(l) = 

which agrees with 4. Similarly, 
x(2) = 



"0.8 O.r 


x(0) = 


0.8 O.r 


"0.5" 




"0.45" 


0.2 0.9_ 


0.2 0.9_ 


_0.5_ 




_0.55_ 



"0.8 0.1" 


x(l) = 


"0.8 0.1" 


"0.45" 




"0.415" 


0.2 0.9_ 


0.2 0.9_ 


_0.55_ 




_0.585_ 



We can now continue this process, using Formula 7 to compute x(3) from x(2), then x(4) from x(3), 
and so on. This yields (verify) 



x(3) = 



0.3905 
0.6095 



x(4) = 



0.37335 
0.62665 



x(5) = 



0.361345 
0.638655 



(8) 



Thus, after five years, channel 1 will hold about 36%) of the market and channel 2 will hold about 64%)Of 
the market. 



If desired, we can continue the market analysis in the last example beyond the five-year period and explore what 
happens to the market share over the long term. We did so, using a computer, and obtained the following state vectors 
(rounded to six decimal places): 



x(10) 



0.338041 
0.661959 



x(20) w 



0.333466 
0.666534 



X(40) R3 



0.333333 
0.666667 



(9) 



All subsequent state vectors, when rounded to six decimal places, are the same as x(40) , so we see that the market 
shares eventually stabilize with channel 1 holding about one-third of the market and channel 2 holding about 
two-thirds. Later in this section, we will explain why this stabilization occurs. 



Markov Chains 

In many dynamical systems the states of the variables are not known with certainty but can be expressed as 
probabilities; such dynamical systems are called stochastic processes (from the Greek word stokastikos, meaning 
"proceeding by guesswork"). A detailed study of stochastic processes requires a precise definition of the term 
probability, which is outside the scope of this course. However, the following interpretation will suffice for our present 
purposes: 



Stated informally, the probability that an experiment or observation will have a certain outcome is 
approximately the fraction of the time that the outcome would occur if the experiment were to be repeated many 
times under constant conditions — the greater the number of repetitions, the more accurately the probability 
describes the fraction of occurrences. 



For example, when we say that the probability of tossing heads with a fair coin is — , we mean that if the coin were 

tossed many times under constant conditions, then we would expect about half of the outcomes to be heads. 
Probabilities are often expressed as decimals or percentages. Thus, the probability of tossing heads with a fair coin can 
also be expressed as 0.5 or 50%. 

If an experiment or observation has n possible outcomes, then the probabilities of those outcomes must be nonnegative 
fractions whose sum is 1 . The probabilities are nonnegative because each describes the fraction of occurrences of an 
outcome over the long term, and the sum is 1 because they account for all possible outcomes. For example, if a box 
containing 10 balls has one red ball, three green balls, and six yellow balls, and if a ball is drawn at random from the 
box, then the probabilities of the various outcomes are 

p\ = prob(red) = 1 / 10 = 0. 1 

^2 = prob(green) = 3 / 10 = 0.3 

P2 = prob(yellow) = 6/10 = 0.6 

Each probability is a nonnegative fraction and 

P\ P2^P2 = ^^ ♦ 0.3 I 0.6= 1 
In a stochastic process with n possible states, the state vector at each time t has the form 



Probability that the system is in state 1 
Pi'obabilitj- that the system is in state 2 

Pi'ob ability- that the sj-stem is in state // 

The entries in this vector must add up to 1 since they account for all n possibilities. In general, a vector with 
nonnegative entries that add up to 1 is called a probability vector. 

EXAMPLE 3 Example 1 Revisited from the Probability Viewpoint A 

Observe that the state vectors in Example 1 and Example 2 are all probability vectors. This is to be 
expected since the entries in each state vector are the fractional market shares of the channels, and together 
they account for the entire market. In practice, it is preferable to interpret the entries in the state vectors as 
probabilities rather than exact market fractions, since market information is usually obtained by statistical 
sampling procedures with intrinsic uncertainties. Thus, for example, the state vector 



«(!) = 



"^l(l)" 




"0.45' 


/2(1)_ 




_0.55_ 



which we interpreted in Example 1 to mean that channel 1 has 45% of the market and channel 2 has 55%, 
can also be interpreted to mean that an individual picked at random from the market will be a channel 1 
viewer with probability 0.45 and a channel 2 viewer with probability 0.55. 



^2(0 



A square matrix, each of whose columns is a probability vector, is called a stochastic matrix. Such matrices commonly 
occur in formulas that relate successive states of a stochastic process. For example, the state vectors \.(k I 1) and 
in 7 are related by an equation of the form x(k-\-\) =Px{k) in which 



P = 



0.8 0.1 
0.2 0.9 



(10) 



is a stochastic matrix. It should not be surprising that the column vectors of P are probability vectors, since the entries 
in each column provide a breakdown of what happens to each channel's market share over the year — the entries in 
column 1 convey that each year channel 1 retains 80%) of its market share and loses 20%); and the entries in column 2 
convey that each year channel 2 retains 90%) of its market share and loses 10%). The entries in 10 can also be viewed as 
probabilities: 

p\\ = 0.8 = probability that a channel 1 viewer remains a channel 1 viewer 
P2\ = 0.2 = probability that a channel 1 viewer becomes a channel 2 viewer 
p^2 = 0. 1 = probability that a channel 2 viewer becomes a channel 1 viewer 
P22 = 0.9 = probability that a channel 2 viewer remains a channel 2 viewer 

Example 1 is a special case of a large class of stochastic processes, called Markov chains. 




Andrei Andreyevich Markov (1856-1922) 

Historical Note Markov chains are named in honor of the Russian mathematician A. A. Markov, a lover of 
poetry, who used them to analyze the alternation of vowels and consonants in the poem Eugene Onegin by 
Pushkin. Markov believed that the only applications of his chains were to the analysis of literary works, so he 
would be astonished to learn that his discovery is used today in the social sciences, quantum theory, and 
genetics! 

{Image: wikipedia] 



r 



DEFINITION 1 

A Markov chain is a dynamical system whose state vectors at a succession of time intervals are probability 
vectors and for which the state vectors at successive time intervals are related by an equation of the form 

in which P= [Pij] is a stochastic matrix and Pij is the probability that the system will be in state / at time 
i = + 1 if it is in state j at time i = The matrix P is called the transition matrix for the system. 



Remark Note that in this definition the row index / corresponds to the later state and the column index j to the earlier 
state (Figure 4.12.2). 



Stale at time / = A: 



State at time 

/ = A- + 1 



The entry is the probabihty 
that the system is in state / at 
time / = /:+! if it is in state j 
at time/ = i(:. 



Figure 4.12.2 



EXAMPLE 4 Wildlife Migration as a Markov Chain M 



Suppose that a tagged lion can migrate over three adjacent game reserves in search of food, reserve 1, 
reserve 2, and reserve 3. Based on data about the food resources, researchers conclude that the monthly 
migration pattern of the lion can be modeled by a Markov chain with transition matrix 

Reserve at time t = k 
1 2 3 



P = 



0.5 0.4 0.6 
0.2 0.2 0.3 
0.3 0.4 0.1 



Reserve at time t = k+\ 



(see Figure 4.12 


3). That 


is, 


pn 


= 0.5 


= probability that the lion will 


P12 


= 0.4 


= probability that the lion will 


Pl3 


= 0.6 


= probability that the lion will 


P2\ 


= 0.2 


= probability that the lion will 


P22 


= 0.2 


= probability that the lion will 


P2Z 


= 0.3 


= probability that the lion will 


PZX 


= 0.3 


= probability that the lion will 


P22 


= 0.4 


= probability that the lion will 


P33 


= 0.1 


= probability that the lion will 



Assuming that t is in months and the Hon is released in reserve 2 at time t = 0-> track its probable 
locations over a six-month period. 



0.5 



Reserve 
I 



0.2 



0.3 



0.4 0.6 




Reserve 


0.3 


Reserve ^ 




2 








0.4 ' 



Figure 4.12.3 



Solution LQtx\(k), X2(k), and X2(k) be the probabilities that the lion is in reserve 1, 2, or 3, 
respectively, at time t = and let 

x(,k)= X2(k) 
X3(k) 

be the state vector at that time. Since we know with certainty that the lion is in reserve 2 at time t = (), the 
initial state vector is 

"0" 

x(0)= 1 
0 



We leave it for you to show that the state vectors over a six-month period are 



x(l) =Px(0) = 



x(4) =Px(3): 



0.400 
0.200 
0.400 

0.505 

0.228 
0.267 



x(2)=ftc(l) = 



x(5)=/^(4)5« 



0.520 
0.240 
0.240 

0.504 
0.227 
0.269 



,x(3)=Px(2) = 



,x(6)=/^(5)ps 



0.500 
0.224 
0.276 

0.504 
0.227 
0.269 



As in Example 2, the state vectors here seem to stabihze over time with a probabihty of approximately 
0.504 that the lion is in reserve 1, a probability of approximately 0.227 that it is in reserve 2, and a 
probability of approximately 0.269 that it is in reserve 3. 



Markov Chains in Terms of Powers of tlie Transition l\/latrix 

In a Markov chain with an initial state of x(0), the successive state vectors are 

x(l) =Px(0), x(2) =PxO), x(3) =Px(2), x(4) =^(3), ... 

For brevity, it is common to denote x(^) by x^, which allows us to write the successive state vectors more briefly 



Note that Formula 12 makes it possible to compute 
the state vector xj^ without first computing the 
earlier state vectors as required in Formula 1 1 . 

Alternatively, these state vectors can be expressed in terms of the initial state vector xq as 

xi=Pxo, X2 = iP(/^o) = -P^xo, X3 = i^(i^^xo) = -P^xo, X4 = P (p\) = P'^^O. • 
from which it follows that 



EXAMPLE 5 Finding a State Vector Directly from xo A 

Use Formula 12 to fmd the state vector x( 3 ) in Example 2. 

Solution From 1 and 7, the initial state vector and transition matrix are 



We leave it for you to calculate and show that 

xj^3j = 13 = ^^X0 = 
which agrees with the result in 8. 



and F = 



0.8 0.1 
0.2 0.9 



0.562 0.219' 




"0.5' 




"0.3905' 


0.438 0.781 




0.5_ 




0.6095 



Long-Term Behavior of a Markov Chain 



We have seen two examples of Markov chains in which the state vectors seem to stabiHze after a period of time. Thus, 
it is reasonable to ask whether all Markov chains have this property. The following example shows that this is not the 
case. 



EXAMPLE 6 A Markov Chain That Does Not Stabilize M 



The matrix 



P = 



0 1 

1 0 



is stochastic and hence can be regarded as the transition matrix for a Markov chain. A simple calculation 
shows that = J, from which it follows that 

Thus, the successive states in the Markov chain with initial vector xq are 

XQ, PXQ, XQ, -PXQ. XQ,... 

which oscillate between xq and Pxq. Thus, the Markov chain does not stabilize unless both components 
of XQ are ^ (verify). 



A precise definition of what it means for a sequence of numbers or vectors to stabilize is given in calculus; however, 
that level of precision will not be needed here. Stated informally, we will say that a sequence of vectors 

XI, X2, Xjt,... 

approaches a limit q or that it converges to q if all entries in xj^ can be made as close as we like to the corresponding 
entries in the vector q by taking k sufficiently large. We denote this by writing — ► q as /t — > oo- 

We saw in Example 6 that the state vectors of a Markov chain need not approach a limit in all cases. However, by 
imposing a mild condition on the transition matrix of a Markov chain, we can guarantee that the state vectors will 
approach a limit. 

r 



DEFINITION 2 

A stochastic matrix P is said to be regular if P or some positive power of P has all positive entries, and a 
Markov chain whose transition matrix is regular is said to be a regular Markov chain. 



EXAMPLE 7 Regular StochasticlVlatrices M 



The transition matrices in Example 2 and Example 4 are regular because their entries are positive. The 
matrix 

"0.5 r 



is regular because 



P = 



0.5 0 



0.75 0.5 
0.25 0.5 



has positive entries. The matrix P in Example 6 is not regular because P and every positive power of P 
have some zero entries (verify). 



The following theorem, which we state without proof, is the fundamental result about the long-term behavior of 
Markov chains. 



THEOREM 4.12.1 

If P is the transition matrix for a regular Markov chain, then: 

(a) There is a unique probability vector q such that Pq = q. 

(b) For any initial probability vector xq, the sequence of state vectors 

XQ, -PxQ. P^XQ, ... 

converges to q. 



The vector q in this theorem is called the steady-state vector of the Markov chain. It can be found by rewriting the 
equation in part (a) as 

(/-P)q=0 

and then solving this equation for q subject to the requirement that q be a probability vector. Here are some examples. 



EXAMPLE 7 Example 1 and Example 2 Revisited A 

The transition matrix for the Markov chain in Example 2 is 

„^ro.8 o.r 

[0.2 0.9_ 

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q . To fmd q we will solve the system (/ — q = 0, which we can write as 



0.2 


-or 


~<i\' 




0" 


0.2 


0.1 


<12 




0 



The general solution of this system is 

^1 = 0.55, q2 = s 

(verify), which we can write in vector form as 



q = 



?2 



2" 



(13) 



For q to be a probability vector, we must have 



2 

which impHes that ^ = Substituting this value in 13 yields the steady-state vector 



9r= 



which is consistent with the numerical results obtained in 9. 



EXAMPLE 9 Example 4 Revisited M 



The transition matrix for the Markov chain in Example 4 is 

"0.5 0.4 0.6 
P= 0.2 0.2 0.3 
0.3 0.4 0.1 

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q . To fmd q we will solve the system (/ — ^')q = 0, which we can write (using fractions) as 



(14) 



(We have converted to fractions to avoid roundoff error in this illustrative example.) We leave it for you 
to confirm that the reduced row echelon form of the coefficient matrix is 



1 


2 


3 


2 


5 


5 


1 


4 


3 


5 


5 


10 


3 


2 


9 


"lO 


5 


10 







"o" 






0 


^3 




0 



1 0 --^ 

0 1 

0 0 



1^ 

8 

27 
32 
0 



and that the general solution of 14 is 



15 



27. 



For q to be a probability vector we must have ^ i + 13'2 + "^S = ^ ' from which it follows that s = 
(verify). Substituting this value in 15 yields the steady-state vector 



(15) 



32 

119 



q = 



60 
119 
27 
119 
32 
119 



0.5042 
0.2269 
0.2689 



(verify), which is consistent with the resuhs obtained in Example 4. 



Concept Review 

• Dynamical system 

• State of a variable 

• State of a dynamical system 

• Stochastic process 

• Probability 

• Probability vector 

• Stochastic matrix 

• Markov chain 

• Transition matrix 

• Regular stochastic matrix 

• Regular Markov chain 

• Steady-state vector 

Skills 

• Determine whether a matrix is stochastic. 

• Compute the state vectors from a transition matrix and an initial state. 

• Determine whether a stochastic matrix is regular. 

• Determine whether a Markov chain is regular. 

• Find the steady-state vector for a regular transition matrix. 



Exercise Set 4.12 



In Exercises 1-2, determine whether^ is a stochastic matrix. If ^ is not stochastic, then explain why not. 

l-(a) ,^[0.4 0.3 
0.6 0.7 

(b) 0.4 0.6 
0.3 0.7 



(c) 



A = 



(d) 



1 4 4 



2 

r, 1 



1 


1 


3 


3 


1 


1 


6 


3 


1 


1 


2 


3 



1 

2 

2 
'2 

4 1 



Answer: 



(a) Stochastic 

(b) Not stochastic 

(c) Stochastic 

(d) Not stochastic 



(C) 

(d) 



0.2 0.9 

0.8 0.1 

0.2 0.8 

0.9 0.1 



J_ 
12 



9 6 



A^ 



-14 4 



0 4 4 



1 1 

3 2 

1 i 

3 2 



i 0 



In Exercises 3-4, use Fomulas 11 and 12 to compute the state vector X4 in two different ways. 



P = 



0.5 0.6 
0.5 0.4 



Answer: 

0.54545 
0.45455 



P = 



"0.8 


0.5" 




1" 


0.2 


0.5_ 




_0_ 



In Exercises 5-6, determine whether is a regular stochastic matrix. 



4 

5 



6 

7 



(b) 



P = 



(c) 



4 0 



^ 1 



4 1 



P = 



Answer: 



(a) Regular 

(b) Not regular 

(c) Regular 



6. 



(a) 



P = 



(b) 



P = 



(c) 



3 1 

4 3 

1 1 

4 3 



In Exercises 7-10, verify that P is a regular stochastic matrix, and find the steady-state vector for the associated 
Markov chain. 



P = 



1 2 

4 3 

1 1 

4 3 



Answer: 



17 
_9_ 
17 



? [0.2 0.6] 
[0.8 O.4J 



1 1 0 

2 2 

ill 
4 2 3 



± 0 ^ 



Answer: 



11 
A_ 
11 
3_ 
11 



10. 



P = 



1 
3 

0 4 ^ 



1 

4 
3 
4 



^ 0 



11. Consider a Markov process with transition matrix 

Statel State2 
State 



State 



1 [0.2 0.1] 

2 [o.8 O.9J 



(a) What does the entry 0.2 represent? 

(b) What does the entry 0.1 represent? 

(c) If the system is in state 1 initially, what is the probability that it will be in state 2 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the 
next observation? 



Answer: 

(a) Probability that something in state 1 stays in state 1 

(b) Probability that something in state 2 moves to state 1 

(c) 0.8 

(d) 0.85 

12. Consider a Markov process with transition matrix 

Statel State 2 



State 1 
Statel 



0 :k 



1 ^ 



(a) What does the entry y represent? 

(b) What does the entry 0 represent? 



(c) If the system is in state 1 initially, what is the probability that it will be in state 1 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the 
next observation? 



13. On a given day the air quality in a certain city is either good or bad. Records show that when the air quality is good 
on one day, then there is a 95% chance that it will be good the next day, and when the air quality is bad on one day, 
then there is a 45% chance that it will be bad the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the air quality is good today, what is the probability that it will be good two days from now? 

(c) If the air quality is bad today, what is the probability that it will be bad three days from now? 

(d) If there is a 20% chance that the air quality will be good today, what is the probability that it will be good 
tomorrow? 

Answer: 



(a) 



0.95 0.55" 
0.05 0.45 



(b) 0.93 

(c) 0.142 

(d) 0.63 

14. In a laboratory experiment, a mouse can choose one of two food types each day, type I or type II. Records show that 
if the mouse chooses type I on a given day, then there is a 75% chance that it will choose type I the next day, and if 
it chooses type II on one day, then there is a 50% chance that it will choose type II the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the mouse chooses type I today, what is the probability that it will choose type I two days from now? 

(c) If the mouse chooses type II today, what is the probability that it will choose type II three days from now? 

(d) If there is a 10% chance that the mouse will choose type I today, what is the probability that it will choose type 
I tomorrow? 

15. Suppose that at some initial point in time 100,000 people live in a certain city and 25,000 people live in its suburbs. 
The Regional Planning Commission determines that each year 5% of the city population moves to the suburbs and 
3% of the suburban population moves to the city. 

(a) Assuming that the total population remains constant, make a table that shows the populations of the city and its 
suburbs over a five-year period (round to the nearest integer). 

(b) Over the long term, how will the population be distributed between the city and its suburbs? 
Answer: 



(a) 



Year 


1 


2 


3 


4 


5 


City 


95,750 


91,840 


88,243 


84,933 


81,889 


Suburbs 


29,250 


33,160 


36,757 


40,067 


43,111 



(b) 



City 



46,875 



Suburbs 



78,125 



16. Suppose that two competing television stations, station 1 and station 2, each have 50% of the viewer market at some 
initial point in time. Assume that over each one-year period station 1 captures 5% of station 2's market share and 
station 2 captures 10% of station I's market share. 

(a) Make a table that shows the market share of each station over a five-year period. 

(b) Over the long term, how will the market share be distributed between the two stations? 

17. Suppose that a car rental agency has three locations, numbered 1, 2, and 3. A customer may rent a car from any of 
the three locations and return it to any of the three locations. Records show that cars are rented and returned in 
accordance with the following probabilities: 

Rented from Location 



Returned to Location 2 



1 


2 


3 


1 


1 


3 


10 


5 


5 


4 


3 


1 


5 


10 


5 


1 


1 


1 


10 


2 


5 



(a) Assuming that a car is rented from location 1, what is the probability that it will be at location 1 after two 
rentals? 

(b) Assuming that this dynamical system can be modeled as a Markov chain, find the steady-state vector. 

(c) If the rental agency owns 120 cars, how many parking spaces should it allocate at each location to be 
reasonably certain that it will have enough spaces for the cars over the long term? Explain your reasoning. 



Answer: 


(a) . 


23 


100 


(b) 


46 






159 






22 






53 






47 






159 




(c) 35,50,35 



18. Physical traits are determined by the genes that an offspring receives from its parents. In the simplest case a trait in 
the offspring is determined by one pair of genes, one member of the pair inherited from the male parent and the 
other from the female parent. Typically, each gene in a pair can assume one of two forms, called alleles, denoted by 
A and a. This leads to three possible pairings: 

AA, Aa, aa 

called genotypes (the pairs Aa and aA determine the same trait and hence are not distinguished from one another). It 
is shown in the study of heredity that if a parent of known genotype is crossed with a random parent of unknown 
genotype, then the offspring will have the genotype probabilities given in the following table, which can be viewed 
as a transition matrix for a Markov process: 



Genotype of Parent 





AA 


Aa 


aa 


AA 


1 

2 


1 

4 


0 


Genotype of Offspring Aa 


1 

2 


1 

2 


1 

2 


aa 


0 


1 
4 


1 
2 



Thus, for example, the offspring of a parent of genotype AA that is crossed at random with a parent of unknown 
genotype will have a 50% chance of being AA, a 50% chance of being Aa, and no chance of being aa. 

(a) Show that the transition matrix is regular. 

(b) Find the steady-state vector, and discuss its physical interpretation. 
19. Fill in the missing entries of the stochastic matrix 



7 




1 


10 




5 


« 


3 

10 




1 


3 


3 


10 


5 


10 



and find its steady-state vector. 
Answer: 



P = 



1_ 
10 
i 
5 

_L 

10 



1 


1 


10 


5 


3 


1 


10 


2 


3 


3 


5 


10 





1 




3 




1 


; q= 


3 




1 




3 



20. If P is an ^ X n stochastic matrix, and if M is a 1 x « matrix whose entries are all I's, then MP = 

21. If P is a regular stochastic matrix with steady-state vector q, what can you say about the sequence of products 



P\. 



asfc — 00? 
Answer: 



22. 



f "'^q = q for every positive integer k 

(a) If P is a regular nxn stochastic matrix with steady-state vector and if e i , 62, Byi are the standard unit 
vectors in column form, what can you say about the behavior of the sequence 



as t 'X for each I = \ ,2, .... ^7 



(b) What does this tell you about the behavior of the column vectors of as ^ ■ 



23. Prove that the product of two stochastic matrices is a stochastic matrix. [Hint: Write each column of the product as 
a linear combination of the columns of the first factor. 



24. Prove that if P is a stochastic matrix whose entries are all greater than or equal to then the entries of are 
greater than or equal to p. 

True-False Exercises 



In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 



(a) 



The vector 



1 
3 
0 
2 
3 



is a probability vector. 



Answer: 



True 



(b) 



The matrix 



0 
0 



2 1 
8 0 



is a regular stochastic matrix. 



Answer: 

True 

(c) The column vectors of a transition matrix are probability vectors. 
Answer: 

True 

(d) A steady-state vector for a Markov chain with transition matrix P is any solution of the linear system (/ — P)q = 0. 
Answer: 

False 

(e) The square of every regular stochastic matrix is stochastic. 
Answer: 

True 
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Chapter 4 Supplementary Exercises 

1. Let Fbe the set of all ordered pairs of real numbers, and consider the following addition and scalar 
multiplication operations on u = (2^1, U2, W3) and v = (vj, V2, V3) : 

u I v= (u\ -\-v\,U2 I V2, «3+V3), ^=(Aiii,0, 0) 

(a) Compute u + V and tu for u = (3, — 2, 4), v= (1, 5, — 2), andfc = — ]. 

(b) In words, explain why Fis closed under addition and scalar multiplication. 

(c) Since the addition operation on Fis the standard addition operation on certain vector space axioms 
hold for F because they are known to hold for /J^. Which axioms in Definition 1 of Section 4.1 are 
they? 

(d) Show that Axioms 7, 8, and 9 hold. 

(e) Show that Axiom 10 fails for the given operations. 

Answer: 

(a) u + v=C4.3.2). -u=(-3,0, 0) 

(c) Axioms 1-5 

2. In each part, the solution space of the system is a subspace of /^-^ and so must be a line through the origin, 
a plane through the origin, all of Z^-^, or the origin only. For each system, determine which is the case. If 
the subspace is a plane, find an equation for it, and if it is a line, find parametric equations. 

(a) 0x + 0y + 0z=0 

(b) 2x-3y+ z = 0 
6;^ -97 + 3z = 0 

-4x + 6y -2z = 0 

(c) x-2y + 7z = 0 
-4x + By + 5z = 0 

2x -4y + 3z = 0 

(d) x+Ay + Sz = 0 
2x + 5y + 6z = 0 
3x+ 7-4z = 0 

3. For what values of s is the solution space of 

XI + X2+SX2 = 0 
x\ +SX2+ X2 = 0 
sx\ + X2+ X2 = 0 

the origin only, a line through the origin, a plane through the origin, or all of 



Answer: 



If ff 3C 1, — 2, the solution space is the origin. If ^ = 1 , the solution space is a plane through the origin. If 
s= — 2? the solution space is a line through the origin. 

(a) Express (4a, a^b,a + 2b) as a linear combination of (4, 1, 1) and (0, —1,2). 

(b) Express (3a + 6 + 3c, — a + 4A — c, 2<a( + i + 2c) as a linear combination of (3, — 1, 2) and 



(c) Express (2a — i + 4c, 3a — c, 4i + c) as a linear combination of three nonzero vectors. 

5. Let WhQ the space spanned by f = sin ;c and 8 = cos x, 

(a) Show that for any value of p, Fi = sin(7: + ff) and gi = cos(x + 0) are vectors in W. 

(b) Show that f j and 81 form a basis for W. 

^' (a) Express v=(l,l)asa linear combination of vi = (1, — 1), V2 = (3, 0), and V3 = (2, 1) in two 

different ways. 

(b) Explain why this does not violate Theorem 4.4. 1 . 

7. Let A be an ^ v ^. matrix, and let vi , V2, . . Vvi be linearly independent vectors in i?" expressed as x 1 
matrices. What must be true about A for Avi, -4v2, to be linearly independent? 



A must be invertible 

8. Must a basis for P^.^ contain a polynomial of degree k for each ^ = 0, 1 , 2, . . Justify your answer. 



Find the rank and nullity of the following checkerboard matrices. 

(a) The 3x3 checkerboard matrix. 

(b) The 4x4 checkerboard matrix. 

(c) The w X « checkerboard matrix. 

Answer: 

(a) Rank = 2, nulHly = 1 

(b) Rank = 2, nulUly = 2 

(c) Rank = 2, nulli1y = » — 2 

10. For the purpose of this exercise, let us define an "X-matrix" to be a square matrix with an odd number of 
rows and columns that has O's everywhere except on the two diagonals where it has I's. Find the rank and 
nullity of the following X-matrices. 



(1.4. 1). 



Answer: 



9. 




(a) 10 1 



0 1 0 

1 0 1 



(b) 1 0 0 0 1 
0 10 10 
0 0 10 0 

0 10 10 

1 0 0 0 1 

(c) theX-matrix of size (2« I 1) x (2« + "[) 

11. In each part, show that the stated set of polynomials is a subspace of P„ and find a basis for it. 

(a) All polynomials in f \, such that p ( — x) = pix) . 

(b) All polynomials in P„ such that ;>(0) = 0. 

Answer: 

(a) {l. x\ x^ x^] where 2m = « if « is even and 2« = « — 1 if « is odd. 

(b) l^x, x^, x^. x«} 

12. {Calculus required) Show that the set of all polynomials in F„ that have a horizontal tangent at = Q is a 
subspace of P„. Find a basis for this subspace. 

(a) Find a basis for the vector space of all 3 x 3 symmetric matrices. 

(b) Find a basis for the vector space of all 3 x 3 skew-symmetric matrices. 



Answer: 



(a) 



(b) 



1 0 0 
0 0 0 
0 0 0 





'0 


1 


0' 




'0 


0 


r 




'0 


0 


0" 




"0 


0 


0' 




"0 


0 


0' 




> 


1 


0 


0 


7 


0 


0 


0 




0 


1 


0 




0 


0 


1 




0 


0 


0 


1 




0 


0 


0 




1 


0 


0 




0 


0 


0 




0 


1 


0 




0 


0 


1 





0 1 0 
-10 0 
0 0 0 



0 0 1 
0 0 0 
-10 0 



0 0 0 
0 0 1 
0-10 



14. Various advanced texts in linear algebra prove the following determinant criterion for rank: The rank of a 
matrix A is r if and only if A has some rxr submatrix with a nonzero determinant, and all square 
submatrices of larger size have determinant zero. [Note: A submatrix of A is any matrix obtained by 
deleting rows or columns of A. The matrix A itself is also considered to be a submatrix of A.} In each part, 
use this criterion to find the rank of the matrix. 

2 0" 
4 -1 



(b) 
(c) 



2 3 
4 6 

0 1 
-1 3 
-1 4 



(d) 



1-12 0 
3 10 0 
-1 2 4 0 



15. Use the result in Exercise 14 above to find the possible ranks for matrices of the form 



0 


0 


0 


0 


0 


a 16 


0 


0 


0 


0 


0 


«26 


0 


0 


0 


0 


0 


«36 


0 


0 


0 


0 


0 


«46 


^51 


<J52 


fl53 


^54 


^55 


^56 



Answer: 

Possible ranks are 2, 1, and 0. 

16. Prove: If is a basis for a vector space then for any vectors a and y in Fand any scalar k, the following 
relationships hold. 

(a) (u + v)5'=(u)5'+(v)5 

(b) (Au)5'=ifc(u)5 
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I CHAPTER I 



5 



Eigenvalues and 
Eigenvectors 



CHAPTER CONTENTS 

5.1. Eigenvalues and Eigenvectors 

5.2. Diagonalization 

5.3. Complex Vector Spaces 

5.4. Differential Equations 



INTRODUCTION 

In this chapter we will focus on classes of scalars and vectors known as "eigenvalues" and 
"eigenvectors," terms derived from the German word eigen, meaning "own," "peculiar 
to," "characteristic," or "individual." The underlying idea first appeared in the study of 
rotational motion but was later used to classify various kinds of surfaces and to describe 
solutions of certain differential equations. In the early 1900s it was applied to matrices and 
matrix transformations, and today it has applications in such diverse fields as computer 
graphics, mechanical vibrations, heat fiow, population dynamics, quantum mechanics, and 
economics to name just a few. 
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5.1 Eigenvalues and Eigenvectors 

In this section we will define the notions of "eigenvalue" and "eigenvector" and discuss some of their basic 
properties. 



Definition of Eigenvalue and Eigenvector 

We begin with the main definition in this section. 

r n 



DEFINITION 1 

If ^ is an ^ X « matrix, then a nonzero vector x in is called an eigenvector of A (or of the matrix 
operator T^) if ^ is a scalar multiple of x; that is, 

As: = Xx: 

for some scalar \. The scalar ,\ is called an eigenvalue of A (or of T^), and x is said to be an 
eigenvector corresponding to X. 

L J 



The requirement that an eigenvector be 
nonzero is imposed to avoid the unimportant 
case i40 = AO? which holds for every A and X 



In general, the image of a vector x under multiplication by a square matrix A differs from x in both magnitude 
and direction. However, in the special case where x is an eigenvector of A, multiplication by A leaves the 
direction unchanged. For example, m p} or p} multiplication by A maps each eigenvector x of ^ (if any) 
along the same line through the origin as x. Depending on the sign and magnitude of the eigenvalue ,\ 
corresponding to x, the operation ^4x = ,\x compresses or stretches x by a factor of ,\, with a reversal of 
direction in the case where \ is negative (Figure 5.1.1). 




{a)Q<k<\ (6)A<I (c)-l<A<0 (^)A<-1 

Figure 5.1.1 



EXAMPLE 1 Eigenvector of a 2 X 2 Matrix < 



The vector x = 



is an eigenvector of 



A = 

corresponding to the eigenvalue A = 3^ since 

Ax = 



3 0 
8 -1 



'3 


0" 


T 




"3" 


8 


-1_ 


_2_ 




_6_ 



= 3x 



Geometrically, multiplication by A has stretched the vector x by a factor of 3 (Figure 5.1 .2). 




Figure 5.1.2 



Computing Eigenvalues and Eigenvectors 

Our next objective is to obtain a general procedure for finding eigenvalues and eigenvectors of an ,>2 x n 
matrix A. We will begin with the problem of finding the eigenvalues of A. Note first that the equation 
Ax, = Ax can be rewritten as j^x = A/x? or equivalently, as 

(A/-^)x = 0 

For X to be an eigenvalue of A this equation must have a nonzero solution for x. But it follows from parts (b) 
and (g) of Theorem 4.10.4 that this is so if and only if the coefficient matrix A/ — -4 has a zero determinant. 
Thus, we have the following result. 



THEOREM 5.1.1 

If ^ is an ^2 X « matrix, then A is an eigenvalue of A if and only if it satisfies the equation 

det(A/--4) = 0 (1) 
This is called the characteristic equation of A. 



EXAMPLE 2 Finding Eigenvalues M 



In Example 1 we observed that ,\ = 3 is an eigenvalue of the matrix 

'3 0" 



A = 



8 -1 



but we did not explain how we found it. Use the characteristic equation to find all eigenvalues 
of this matrix. 

Solution It follows from Formula I that the eigenvalues of A are the solutions of the equation 
det(A/ — A) = 0, which we can write as 

A-3 0 
-8 A+1 

from which we obtain 



= 0 



(A-3)(A+1) = 0 



(2) 



This shows that the eigenvalues of A are A = 3 and \= — ] . Thus, in addition to the 
eigenvalue A = 3 noted in Example 1, we have discovered a second eigenvalue A = — 1 • 



When the determinant det(A/ — A) that appears on the left side of 1 is expanded, the result is a polynomial 
p(X) of degree n that is called the characteristic polynomial of A. For example, it follows from 2 that the 
characteristic polynomial of the 2x2 matrix A in Example 2 is 

;?(A) = (A-3)CA+l)=A^-2A-3 

which is a polynomial of degree 2. In general, the characteristic polynomial of an ^ x « matrix has the form 

;?(A)=A" + <:iA"-^+... + c„ 

in which the coefficient of A" is 1 (Exercise 17). Since a polynomial of degree n has at most n distinct roots, it 
follows that the equation 

A" + ciA"-^+... + ^„ = 0 (3) 

has at most n distinct solutions and consequently that an ^2 x n matrix has at most n distinct eigenvalues. Since 
some of these solutions may be complex numbers, it is possible for a matrix to have complex eigenvalues, 
even if that matrix itself has real entries. We will discuss this issue in more detail later, but for now we will 
focus on examples in which the eigenvalues are real numbers. 

EXAMPLE 3 Eigenvalues of a 3 x 3 Matrix M 

Find the eigenvalues of 



0 


1 


0 


0 


0 


1 


4 


-17 


8 



Solution The characteristic polynomial of A is 

A -1 0 
6et(XI - A) = det 0 A -1 
-4 17 A- 8 



= A^-8a2 + 17A-4 



The eigenvalues of A must therefore satisfy the cubic equation 

a3_8a2 + 17A-4 = 0 



(4) 



To solve this equation, we will begin by searching for integer solutions. This task can be 
simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial 
equation with integer coefficients 

In applications involving large matrices 
it is often not feasible to compute the 
characteristic equation directly so other 
methods must be used to find 
eigenvalues. We will consider such 
methods in Chapter 9. 

must be divisors of the constant term, c^. Thus, the only possible integer solutions of 4 are the 
divisors of —4, that is, | 1, _L 2? I 4- Successively substituting these values in 4 shows that 
A = 4 is an integer solution. As a consequence, ,\ _ 4 must be a factor of the left side of 4. 
Dividing A — 4 into^"^ — 8A^ + 17A — 4 shows that 4 can be rewritten as 

(A-4)(a2-4A+i) = 0 

Thus, the remaining solutions of 4 satisfy the quadratic equation 

a2_4A+1 = 0 

which can be solved by the quadratic formula. Thus the eigenvalues of A are 

A = 4, A = 2 + /3, and A = 2-/3 



EXAMPLE 4 Eigenvalues of an Upper Triangular Matrix A 

Find the eigenvalues of the upper triangular matrix 



.4 = 





«12 


«13 


«14 


0 


«22 


«23 


«24 


0 


0 


£233 


£234 


0 


0 


0 


c244 



det(A/-^) =det 



Solution Recalling that the determinant of a triangular matrix is the product of the entries on 
the main diagonal (Theorem 2.1.2), we obtain 

A-^ii -ai2 -«3i3 -(314 

0 A — (322 ~^73 ~^24 

0 0 A — (3(33 ~'^34 

0 0 0 X-a44 

= (A-i»ii)(A-a22)(A-«at33)(A-a44) 

Thus, the characteristic equation is 

(A - (J 11 ) (A - ^22) (A - (333) (A - (344) = 0 

and the eigenvalues are 

X = aii, X = a22, ^ = (333, A = (344 

which are precisely the diagonal entries of A. 



The following general theorem should be evident from the computations in the preceding example. 



THEOREM 5.1.2 



If ^ is an « X « triangular matrix (upper triangular, lower triangular, or diagonal), then the eigenvalues 
of A are the entries on the main diagonal of A. 



EXAMPLES Eigenvalues of a Lower Triangular Matrix 



By inspection, the eigenvalues of the lower triangular matrix 

0 0 



A = 



i 
2 



-1 I 0 
5 -8 -1 



are A = -i, A = y, and A = — ^. 



Had Theorem 5.1.2 been available earlier, we 
could have anticipated the result obtained in 
Example 2. 



THEOREM 5.1.3 



If ^ is an ^2 X « matrix, the following statements are equivalent. 

(a) \ is an eigenvalue of ^. 

(b) The system of equations (A/ — -4)x = 0 has nontrivial solutions. 

(c) There is a nonzero vector x such that Ax = Ax 

(d) X is a solution of the characteristic equation det(A/ — ^) = 0 



Now that we know how to find the eigenvalues of a matrix, we will consider the problem of finding the 
corresponding eigenvectors. Since the eigenvectors corresponding to an eigenvalue A of a matrix v4 are the 
nonzero vectors that satisfy the equation 



these eigenvectors are the nonzero vectors in the null space of the matrix A/ — -4- We call this null space the 
eigenspace of A corresponding to A- Stated another way, the eigenspace of A corresponding to the eigenvalue 
A is the solution space of the homogeneous system (AI — -4)x = 0. 

Notice that x = 0 is in every eigenspace even 
though it is not an eigenvector. Thus, it is the 
nonzero vectors in an eigenspace that are the 
eigenvectors. 



EXAMPLE 6 Bases for Eigenspaces A 

Find bases for the eigenspaces of the matrix 



Finding Eigenvectors and Bases for Eigenspaces 



(A/-^)x = 0 



A = 



3 
8 



0 
1 



Solution In Example 1 we found the characteristic equation of A to be 

(A-3)(A+1) = 0 



from which we obtained the eigenvalues A = 3 ^nd A = — 1 • Thus, there are two eigenspaces 
of^, one corresponding to each of these eigenvalues. 



By definition. 



K = 



^1 

^2 



is an eigenvector of A corresponding to an eigenvalue ,\ if and only if x is a nontrivial solution 
of (,\/-^)x = 0,thatis, of 

"A-3 0 ir^ 

-8 A+1 ^2 

If A = 3» then this equation becomes 



;H:1 



0 0" 




-8 4_ 


X2 



Hi 



whose general solution is 
(verify) or in matrix form, 







r 




= t 


2 


t 







Thus, 



i 
2 
1 

is a basis for the eigenspace corresponding to ^ = 3- We leave it as an exercise for you to 
follow the pattern of these computations and show that 

0" 
_1_ 

is a basis for the eigenspace corresponding to = — 1 . 




Historical Note Methods of linear algebra are used in the emerging field of computerized face 
recognition. Researchers are working with the idea that every human face in a racial group is a 
combination of a few dozen primary shapes. For example, by analyzing three-dimensional scans of 
many faces, researchers at Rockefeller University have produced both an average head shape in the 



Caucasian group — dubbed the meanhead (top row left in the figure to the left) — and a set of 
standardized variations from that shape, called eigenheads (15 of which are shown in the picture). 
These are so named because they are eigenvectors of a certain matrix that stores digitized facial 
information. Face shapes are represented mathematically as linear combinations of the eigenheads. 
[Image: Courtesy Dr. Joseph Atick, Dr Norman Redlich, and Dr Paul Griffith] 



EXAMPLE 7 Eigenvectors and Bases for Eigenspaces M 

Find bases for the eigenspaces of 



A = 



0 0-2 

1 2 1 
1 0 3 



Solution The characteristic equation of ^ is A"^ — 5A^ + 8A — 4 = 0? or in factored form, 
(A — 1) (A — 2) = 0 (verify). Thus, the distinct eigenvalues of v4 are A = 1 and A = 2? so there 
are two eigenspaces of A. 

By definition. 



x = 



^1 

^3 



is an eigenvector of A corresponding to ,\ if and only if x is a nontrivial solution of 
(A/ — -4) X = 0, or in matrix form. 



AO 2 " 






"0" 


1 A-2 -1 






0 


1 0 A-3 


X3 




0 



(5) 



In the case where A = 2? Formula 5 becomes 



2 


0 


2" 






'0" 


1 


0 


-1 






0 


1 


0 


-1 


^3 




0 



Solving this system using Gaussian elimination yields (verify) 

Thus, the eigenvectors of A corresponding to ^ = 2 are the nonzero vectors of the form 





' —s' 




' —s' 




0 




-1 




0 


X = 


t 




0 


+ 


t 


= s 


0 


■¥t 


1 




s 








0 




1 




0 



Since 



■-r 




'0" 


0 


and 


1 


1 




0 



are linearly independent (why?), these vectors form a basis for the eigenspace corresponding to 
A=2. 

If = 1 , then 5 becomes 



1 


0 


2" 






'0' 


-1 


-1 


-1 






0 


-1 


0 


-2 


X3 




0 



Solving this system yields (verify) 

Thus, the eigenvectors corresponding to ^ = 1 are the nonzero vectors of the form 



"-2s" 




'-2 




-2 




= s 


1 


so that 


1 


s 




1 




1 



is a basis for the eigenspace corresponding to = 1 . 



Powers of a Matrix 

Once the eigenvalues and eigenvectors of a matrix A are found, it is a simple matter to find the eigenvalues 
and eigenvectors of any positive integer power of A; for example, if A is an eigenvalue of A and x is a 
corresponding eigenvector, then 

a\ = A(Ax) = A(>jc) = X(Ax.) = A(Ax) = A^x 

which shows that \^ is an eigenvalue of and that x is a corresponding eigenvector. In general, we have the 
following result. 

THEOREM 5.1.4 

If ^ is a positive integer, A is an eigenvalue of a matrix A, and x is a corresponding eigenvector, then 
is an eigenvalue of and x is a corresponding eigenvector. 



EXAMPLE 8 Powers of a Matrix M 



In Example 7 we showed that the eigenvalues of 





'0 


0 


-2 


A = 


1 


2 


1 




1 


0 


3 



are ^ = 2 and A = 1 ? so from Theorem 5.1.4 both \ — 2 =\2B A = 1 = 1 ^^e eigenvalues of 
. We also showed that 



-r 




'0" 


0 


and 


1 


1 




0 



are eigenvectors oiA corresponding to the eigenvalue ,\ = 2? so from Theorem 5.1.4 they are also 
eigenvectors of a' corresponding to A = 2'^ = 128- Similarly, the eigenvector 

'-2' 
1 
1 

of ^ corresponding to the eigenvalue ^ = ] is also an eigenvector of corresponding to 
A=l^=l. 



Eigenvalues and Invertibility 

The next theorem establishes a relationship between eigenvalues and the invertibility of a matrix. 



THEOREM 5.1.5 

A square matrix A is invertible if and only if A = 0 is not an eigenvalue of A. 

Proof Assume that ^ is an ^ x « matrix and observe first that A = 0 is a solution of the characteristic 
equation 

A" + ciA"-l+... + c„ = 0 

if and only if the constant term Cf^ is zero. Thus, it suffices to prove that A is invertible if and only if ^ 0. 
But 

det(A/ - ^) = A" + ciA""* + ... + c„ 

or, on setting A = 0^ 

det(-A) =c„ or (-1)" det (A) =c„ 



It follows from the last equation that det(A) = 0 if and only if c„ = 0, and this in turn implies that A is 
invertible if and only if c:„ 9t 0. 



EXAMPLE 9 Eigenvalues and Invertibility M 

The matrix A in Example 7 is invertible since it has eigenvalues A = 1 ^i^d A = 2? neither of which 
is zero. We leave it for you to check this conclusion by showing that dtt(A) ^ 0. 



More on the Equivalence Theorem 

As our final result in this section, we will use Theorem 5.1.5 to add one additional part to Theorem 4.10.4. 

THEOREM 5.1 .6 Equivalent Statements 

If ^ is an X « matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is /„. 

(d) A is expressible as a product of elementary matrices. 
(^) Ax = b is consistent for every ^2x1 matrix b. 

(0 Ax = h has exactly one solution for every ^ x 1 matrix b. 

(g) det(^)^0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 
(j) The column vectors of A span /J". 

(k) The row vectors of A span 

(I) The column vectors of A form a basis for R^. 

(m) The row vectors of A form a basis for R^. 

(n) Ah^srankn- 

(o) ^ has nullity 0- 

(p) The orthogonal complement of the null space of A is 

(q) The orthogonal complement of the row space of A is (0 } . 

(r) The range of Tj\ is 

(s) Tj\ is one-to-one. 



(t) = 0 is not an eigenvalue of A. 
This theorem relates all of the major topics we have studied thus far. 



Concept Review 

' Eigenvector 

« Eigenvalue 

• Characteristic equation 

• Characteristic polynomial 
« Eigenspace 

• Equivalence Theorem 

Skills 

• Find the eigenvalues of a matrix. 

• Find bases for the eigenspaces of a matrix. 



Exercise Set 5.1 



In Exercises 1-2, confirm by multiplication that x is an eigenvector of A, and find the corresponding 
eigenvalue. 



1. 


'4 


0 


r 




T 


A = 


2 


3 


2 


; x = 


2 




1 


0 


4 




1 



Answer: 

5 



2. 


2 -1 


-1" 




"r 


A = 


-1 2 


-1 


; x = 


1 




-1 -1 


2 




1 



3. Find the characteristic equations of the following matrices: 

(a) [3 0" 
8 -1_ 

(b) 10 -9 
4 -2 



(c) 
(d) 
(e) 
(f) 



0 3 
4 0 

-2 - 



2 -7] 
1 2j 



0 0 

0 0_ 

1 0" 

0 1 



Answer: 

(a) X^-2X-3 = 0 

(b) A2-8A^-16 = 0 

(c) a2-12 = 0 

(d) a2 + 3 = 0 

(e) a2 = 0 

(f) a2-2A+1=0 

4. Find the eigenvalues of the matrices in Exercise 3 

5. Find bases for the eigenspaces of the matrices in Exercise 3 

Answer: 



(a) 



(b) 



Basis for eigenspace corresponding to A = 3: 
Basis for eigenspace corresponding to A = 4 : 



; basis for eigenspace corresponding to 



(c) 



/T2 

1 



Basis for eigenspace corresponding to A = 

r 3_ 

[ 1 

(d) There are no eigenspaces. 

Basis for eigenspace corresponding to A = 0: ^^j, j^^j 

Basis for eigenspace corresponding toA=l:j^^j, ^^j 



basis for eigenspace corresponding to 



6. Find the characteristic equations of the following matrices: 



(a) 


A 


0 r 






-2 


1 0 






-2 


0 1_ 




(b) 


'3 


0 - 


■5" 




1 


-1 


0 




5 






1 

1 


1 -2_ 


(c) 


-2 




1 

X 




—6 


-2 


0 




1Q 


5 


-4 


(d) 


-1 


0 


r 




— 1 


3 


0 




—4 


13 - 


-1 


(e) 


5 


0 r 






1 


1 0 






—7 


1 0 




(f) 


"5 


6 


2 




0 ■ 


-1 - 


8 




1 


0 - 


2 



7. Find the eigenvalues of the matrices in Exercise 6. 
Answer: 



(a) 1,2,3 

(b) -/2. 0. {2 

(c) -8 

(d) 2 

(e) 2 

(f) -4.3 

8. Find bases for the eigenspaces of the matrices in Exercise 6. 

9. Find the characteristic equations of the following matrices: 



0 


0 


2 


0 


1 


0 


1 


0 


0 


1 


-2 


0 


0 


0 


0 


1 


10 




-9 


0 


4 




-2 


0 


0 




0 -2 


0 




0 


1 



Answer: 



(a) a'* + A^-3A2-A + 2 = 0 

(b) a"^ - 8A^ + 19A^ - 24A I 48 = 0 

10. Find the eigenvalues of the matrices in Exercise 9. 

11. Find bases for the eigenspaces of the matrices in Exercise 9. 

Answer: 



(a) 



X = l:basis 



(b) 



A = 4:basis 



2 
3 
1 
0 

3 
2 
1 
0 
0 





'o' 




0 




0 




1 



A= — 2:basis 



-1 

0 
1 
0 



A= -1 



-2 
1 
1 

0 



12. By inspection, find the eigenvalues of the following matrices: 
(a) r-1 6" 



(b) 



(c) 



0 5 

3 0 0 
-2 7 0 

4 8 1 



—k 0 0 0 



0-^00 



0 
0 



0 1 
0 0 



0 

i 

2 



13. Find the eigenvalues of for 



A = 



3 7 11 

^ 3 8 

0 4 

0 2 



0 
0 



Answer: 



14. Find the eigenvalues and bases for the eigenspaces of for 

/-I -2 -2 

A = 



1 2 
-1 -1 



15. Let ^ be a 2 X 2 matrix, and call a line through the origin of ^ invariant under A if lies on the line 
when X does. Find equations for all lines in if any, that are invariant under the given matrix. 

A: 



=K -1] 



Answer: 

(a) y = t: and y = 2x 

(b) No lines 

(c) y = 0 

16. Find det(^) given that^ has ;?(A) as its characteristic polynomial. 

(a) p(A)=A^-2A^+A+5 

(b) p(A)=A'*-a3 + 7 

[/fm^; See the proof of Theorem 5.1.5.] 

17. Let ^ be an w X « matrix. 

(a) Prove that the characteristic polynomial of A has degree n. 

(b) Prove that the coefficient of A" in the characteristic polynomial is 1 . 

18. Show that the characteristic equation of a 2 x 2 matrix A can be expressed as A"^ — tr(-4)A + det(-4) = 0, 
where tr(.'4) is the trace of ^. 

19. Use the result in Exercise 18 to show that if 

then the solutions of the characteristic equation of A are 

Use this result to show that A has 

(a) two distinct real eigenvalues if (laf — rf) + 4bc > 0. 

■-i 

(b) two repeated real eigenvalues if ((:3t — li)'^ I 4i?t: = 0. 

(c) complex conjugate eigenvalues if (a — <i) + Ahc < 0. 



-* 1 



20. Let A be the matrix in Exercise 19. Show that if fe 0? then 

are eigenvectors of A that correspond, respectively, to the eigenvalues 

Ai = -^[(^ +^) + J {a-df + 4bc^ 

and 

21. Use the result of Exercise 18 to prove that if p(A) is the characteristic polynomial of a 2 x 2 matrix A, 
then;;'(^) = 0. 

22. Prove: If a, b, c, and d are integers such that a + b = c +d, then 



A 



has integer eigenvalues — namely, \i=a~\- b and A? — a^c. 

23. Prove: If .\ is an eigenvalue of an invertible matrix A, and x is a corresponding eigenvector, then 1 / A is 
an eigenvalue of and x is a corresponding eigenvector. 

24. Prove: If ,\ is an eigenvalue of ^, x is a corresponding eigenvector, and 5" is a scalar, then A — s is an 
eigenvalue of ^4 _ sL ^i^d x is a corresponding eigenvector. 

25. Prove: If X is an eigenvalue of A and x is a corresponding eigenvector, then sX is an eigenvalue of sA for 
every scalar s, and x is a corresponding eigenvector. 

26. Find the eigenvalues and bases for the eigenspaces of 

-2 2 3" 



-2 3 2 
-4 2 5 



and then use Exercises 23 and 24 to find the eigenvalues and bases for the eigenspaces of 

(a) i4-^ 

(b) ^-3/ 

(c) ^ + 2/ 

(a) Prove that if v4 is a square matrix, then A and have the same eigenvalues. {Hint: Look at the 
characteristic equationdet(.\/ — ^) = 0.] 

(b) Show that A and ^4 ^ need not have the same eigenspaces. {Hint: Use the result in Exercise 20 to find 
a 2 X 2 matrix for which A and J^^ have different eigenspaces.] 

28. Suppose that the characteristic polynomial of some matrix A is found to be 

p (A) = (A — 1) (A — 3) (A — 4) . In each part, answer the question and explain your reasoning. 

(a) What is the size of Al 

(b) Is A invertible? 

(c) How many eigenspaces does A have? 



29. The eigenvectors that we have been studying are sometimes called right eigenvectors to distinguish them 
from left eigenvectors, which are « x 1 column matrices x that satisfy the equation = ^x^ for some 

scalar pt . What is the relationship, if any, between the right eigenvectors and corresponding eigenvalues A 
of A and the left eigenvectors and corresponding eigenvalues of ^? 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) If ^ is a square matrix and ^ = Ax for some nonzero scalar .\, then x is an eigenvector of A. 
Answer: 

False 

(b) If X is an eigenvalue of a matrix A, then the linear system (A/ — -<4)x = 0 has only the trivial solution. 
Answer: 

False 

(c) If the characteristic polynomial of a matrix^ is p{X) = A + 1, then^ is invertible. 

Answer: 

True 

(d) If A is an eigenvalue of a matrix A, then the eigenspace of A corresponding to A is the set of eigenvectors 
of A corresponding to A- 

Answer: 

False 

(e) If 0 is an eigenvalue of a matrix A, then is singular. 
Answer: 

True 

(f) The eigenvalues of a matrix A are the same as the eigenvalues of the reduced row echelon form of A. 
Answer: 

False 

(g) If 0 is an eigenvalue of a matrix A, then the set of columns of A is linearly independent. 
Answer: 



False 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



5.2 Diagonalization 

In this section we will be concerned with the problem of finding a basis for that consists of eigenvectors of an 
rixn matrix A. Such bases can be used to study geometric properties of A and to simplify various numerical 
computations. These bases are also of physical significance in a wide variety of applications, some of which will be 
considered later in this text. 



The Matrix Diagonalization Problem 

Our first objective in this section is to show that the following two seemingly different problems are equivalent. 

Problem 1 Given an « x « matrix A, does there exist an invertible matrix P such that P~^AP is diagonal? 
Problem 2 Given an « x « matrix A, does A have n linearly independent eigenvectors? 



Similarity 



The matrix product P ^AP that appears in Problem 1 is called a similarity transformation of the matrix A. Such 
products are important in the study of eigenvectors and eigenvalues, so we will begin with some terminology about 
them. 



r 



DEFINITION 1 

If A and B are square matrices, then we say that B is similar to A if there is an invertible matrix P such that 
B = P'^AP' 



Note that if B is similar to A, then it is also true that A is similar to B, since we can express 5 as 5 = Q~^AQ by 
taking Q = P~^. This being the case, we will usually say that A and B are similar matrices if either is similar to 
the other. 



Similarity Invariants 

Similar matrices have many properties in common. For example, if S = P ~^AP^ then it follows that A and B have 
the same determinant, since 



det(5) = det(P"^^p) = det(P"^)det(^)det(P) 
= ^^^det(^)det(P) = det(^) 

In general, any property that is shared by all similar matrices is called a similarity invariant or is said to be 
invariant under similarity. Table 1 lists the most important similarity invariants. The proofs of some of these 
results are given as exercises. 



Table 1 Similarity Invariants 


Property 


Description 


Determinant 


A and P~^AP have the same determinant. 


Invertibility 


A is invertible if and only if P^^AP is invertible. 


Rank 


A and P~^AP have the same rank. 


Nullity 


A and P~^AP have the same nullity. 


Trace 


A and P~^AP have the same trace. 


Characteristic 
polynomial 


A and P~^AP have the same characteristic polynomial. 


Eigenvalues 


A and P~^AP have the same eigenvalues. 


Eigenspace 
dimension 


If A is an eigenvalue of A and hence of P^^AP^ then the eigenspace of A 
corresponding to X and the eigenspace of P~^AP corresponding to ,\ have the same 
dimension. 



Expressed in the language of similarity, Problem 1 posed above is equivalent to asking whether the matrix A is 
similar to a diagonal matrix. If so, the diagonal matrix will have all of the similarity-invariant properties of A, but 
will have a simpler form, making it easier to analyze and work with. This important idea has some associated 
terminology. 

r 

DEFINITION 2 

A square matrix A is said to be diagonalizable if it is similar to some diagonal matrix; that is, if there exists 
an invertible matrix P such that p ~^AP is diagonal. In this case the matrix P is said to diagonalize A. 

L 

The following theorem shows that Problems 1 and 2 posed above are actually two different forms of the same 
mathematical problem. 



THEOREM 5.2.1 

If ^ is an ^ X « matrix, the following statements are equivalent. 



(a) A is diagonalizable. 

(b) A has n linearly independent eigenvectors. 



Part {b) of Theorem 5.2.1 is equivalent to saying 
that there is a basis for i?" consisting of 
eigenvectors of^. Why? 



□ 



Proof (a) =^(b) Since A is assumed to be diagonalizable, it follows that there exists an invertible matrix P and a 
diagonal matrix D such that p ~^ AP = D ^r, equivalently, 



AP = PD (1) 

If we denote the column vectors of by p j , p2, . . p^, and if we assume that the diagonal entries of D are 
Ai, A2, Aj^, then by Formula 6 of Section 1.3 the left side of 1 can be expressed as 

AP = A[fi P2 ... P«] = [^Pi Af2 ... ^P«] 
and, as noted in the comment following Example 1 of Section 1.7, the right side of 1 can be expressed as 

PD= [Aipi A2P2 A„p„] 

Thus, it follows from 1 that 

i4pi=Aipi, Ap2 = A2P2, , Ap„ = A„p„ (2) 

Since P is invertible, we know from Theorem 5.1.6 that its column vectors p\, p2, p^ are linearly independent 
(and hence nonzero). Thus, it follows from 2 that these n column vectors are eigenvectors of A. 

Proof (b) Assume that A has n linearly independent eigenvectors, p 1 , P2. - - Pn? that Ai , A2, . . ., A„ are 

the corresponding eigenvalues. If we let 

^=[P1 P2 ... Vyi] 

and if we let D be the diagonal matrix that has Ai, A2, A„ as its successive diagonal entries, then 

AP = A[pi p2 ... p„] = [Api Ap2 --- Ap„] 

= [AiPi A2P2 --- A„p„]=PZ) 

Since the column vectors of P are linearly independent, it follows from Theorem 5.1.6 that P is invertible, so that 
this last equation can be rewritten as p ~^AP = which shows that A is diagonalizable. 



Procedure for Diagonalizing a Matrix 

The preceding theorem guarantees that an ,>2 x ^ matrix A with n linearly independent eigenvectors is 
diagonalizable, and the proof suggests the following method for diagonalizing A. 



r 



Procedure for Diagonalizing a Matrix 



Step 1. Confirm that the matrix is actually diagonalizable by finding n linearly independent eigenvectors. 
One way to do this is by finding a basis for each eigenspace and merging these basis vectors into a single 
set S. If this set has fewer than n vectors, then the matrix is not diagonalizable. 

Step 2. Form the matrix P=[pi p2 ... p„] that has the vectors in S as its column vectors. 

Step 3. The matrix P~^AP will be diagonal and have the eigenvalues X\, A2, corresponding to the 

eigenvectors pj, p2, Pw as its successive diagonal entries. 

L J 



EXAMPLE 1 Finding a Matrix P That Diagonalizes a Matrix vA M 



Find a matrix P that diagonalizes 



A = 



0 0-2 

1 2 1 
1 0 3 



Solution In Example 7 of the preceding section we found the characteristic equation of A to be 

(A-l)(A-2)2 = 0 

and we found the following bases for the eigenspaces: 





"-1" 




0" 




-2 


A = 2: pi = 


0 


, P2 = 


1 


; A=l: p3 = 


1 




1 




0 




1 



There are three basis vectors in total, so the matrix 





-1 


0 


-2 


p= 


0 


1 


1 




1 


0 


1 



diagonalizes ^. As a check, you should verify that 





1 0 


2" 


' 0 0 


-2" 




0 


-2" 




'2 


0 


0" 


1 1 


1 


1 2 


1 


0 


1 


1 




0 


2 


0 




-1 0 


-1 


21 0 


3 


1 


0 


1 




0 


0 


1 



In general, there is no preferred order for the columns of P. Since the ith diagonal entry of p ^jip is an eigenvalue 
for the ith column vector of P, changing the order of the columns of f just changes the order of the eigenvalues on 
the diagonal of p ~^AP- Thus, had we written 

'-1 -2 0" 

P= 0 1 1 
1 1 0 



in the preceding example, we would have obtained 



2 0 0 

0 1 0 
0 0 2 



EXAMPLE 2 A Matrix That Is Not Diagonalizable M 



Find a matrix P that diagonalizes 



A = 



1 
1 

-3 



0 0 
2 0 
5 2 



Solution The characteristic polynomial of A is 

A - 1 0 0 

det(A/-^)= -1 A- 2 0 

3 -5 A-2 



= (A-l)(A-2) 



2 



so the characteristic equation is 

(A-l)(A-2)2 = 0 

Thus, the distinct eigenvalues of A are ,\ = 1 and A = 2- We leave it for you to show that bases for 
the eigenspaces are 

i 
8 
1 



A=l: pi = 



; A = 2; p2 = 



Since ^ is a 3 x 3 matrix and there are only two basis vectors in total, A is not diagonalizable. 



Alternative Solution If you are concerned only in determining whether a matrix is 
diagonalizable and not with actually finding a diagonalizing matrix P, then it is not necessary to 
compute bases for the eigenspaces — it suffices to find the dimensions of the eigenspaces. For this 
example, the eigenspace corresponding to ,\ = 1 is the solution space of the system 



0 0 0" 






"0" 


1 -1 0 






0 


3 -5 -1 


^3 




0 



Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theorem 4.8.2, and 
hence the eigenspace corresponding to ^ = 1 is one-dimensional. 



The eigenspace corresponding to ^\ = 2 is the solution space of the system 



1 


0 


0" 






'0' 


1 


0 


0 






0 


3 


-5 


0 


X3 




0 



This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corresponding to 
A = 2 is also one-dimensional. Since the eigenspaces produce a total of two basis vectors, and since 
three are needed, the matrix A is not diagonalizable. 



There is an assumption in Example 1 that the column vectors of P, which are made up of basis vectors from the 
various eigenspaces of ^, are linearly independent. The following theorem, proved at the end of this section, shows 
that this is so. 



THEOREM 5.2.2 



If VI, V2, are eigenvectors of a matrix A corresponding to distinct eigenvalues, then 

{vi, V2, vjt) is ^ linearly independent set. 



Remark Theorem 5.2.2 is a special case of a more general result: Suppose that X\, A2, - --^^k distinct 
eigenvalues and that we choose a linearly independent set in each of the corresponding eigenspaces. If we then 
merge all these vectors into a single set, the result will still be a linearly independent set. For example, if we choose 
three linearly independent vectors from one eigenspace and two linearly independent vectors from another 
eigenspace, then the five vectors together form a linearly independent set. We omit the proof 

As a consequence of Theorem 5.2.2, we obtain the following important result. 



Proof If VI, V2, v„ are eigenvectors corresponding to the distinct eigenvalues Ai, A2, then by Theorem 
5.2.2, VI, V2, v„ are linearly independent. Thus, A is diagonalizable by Theorem 5.2.1. 



□ 



THEOREM 5.2.3 



If an « X « matrix A has n distinct eigenvalues, then A is diagonalizable. 



□ 



EXAMPLE 3 Using Theorem 5.2.3 < 



We saw in Example 3 of the preceding section that 



0 1 0 
i4= 0 0 1 
4 -17 8 




4 0 0 
p-^AP= 0 2 + /3 0 

0 0 2-/3 



for some invertible matrix P. If needed, the matrix P can be found using the method shown in 
Example 1 of this section. 



EXAMPLE 4 Diagonalizability of Triangular Matrices M 

From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main diagonal. 
Thus, a triangular matrix with distinct entries on the main diagonal is diagonalizable. For example. 



A = 



-1 


2 


4 


0 


0 


3 


1 


7 


0 


0 


5 


8 


0 


0 


0 


-2 



is a diagonalizable matrix with eigenvalues Xi= — 1, A2 = 3, A3 = 5, A4 = — 2. 



Computing Powers of a Matrix 



There are many applications in which it is necessary to compute high powers of a square matrix A. We will show 
next that if A happens to be diagonalizable, then the computations can be simplified by diagonalizing A. 



To start, suppose that ^ is a diagonalizable nxn matrix, that P diagonalizes A, and that 



= D 





Ai 


0 . 


. 0 




0 


A2 . 


. 0 




0 


0 . 


■ A„ 


Squaring both sides of this equation yields 












0 


... 0 


(p-^AP^^ = 


0 




... 0 




0 


0 


... ^ 



We can rewrite the left side of this equation as 

2 

(p-^AP'^ =P-^APP~^AP = P-^AIAP = P-^A^P 

from which we obtain the relationship P~^A^P = D^- More generally, if A: is a positive integer, then a similar 
computation will show that 



P-^A^P = D^ = 



Af 0 ... 0 



0 J^* 



0 0 ... 



which we can rewrite as 



A^=PD^P-'^ =P 



\\ 0 
0 



0 0 



... 0 
... 0 



... AjJ 



(3) 



Formula 3 reveals that raising a diagonalizable 
matrix ^ to a positive integer power has the effect 
of raising its eigenvalues to that power. 



Note that computing the right side of this formula involves only three matrix multiplications and the powers of the 
diagonal entries of D. For matrices of large size and high powers of ,\, this involves substantially fewer operations 
than computing directly. 

EXAMPLE 5 Power of a Matrix < 

Use 3 to find J[^'^, where 





"0 


0 


-2 


A = 


1 


2 


1 




1 


0 


3 



Solution We showed in Example 1 that the matrix A is diagonalized by 

-1 0 -2' 



and that 



Thus, it follows from 3 that 



P = 



0 
1 



D = P-^AP = 





"-1 0 


-2' 


2^^ 0 


0 


1 


0 


2' 




0 1 


1 


0 2^3 


0 


1 


1 


1 




1 0 


1 


0 0 




-1 


0 


-1 




■-8190 


0 


-16382' 












8191 


8192 8191 












8191 


0 


16383 











(4) 



Remark With the method in the preceding example, most of the work is in diagonalizing A. Once that work is 
done, it can be used to compute any power of A. Thus, to compute j{ ^ we need only change the exponents from 
13 to 1000 in 4. 



Eigenvalues of Powers of a Matrix 



Once the eigenvalues and eigenvectors of any square matrix A are found, it is a simple matter to find the 
eigenvalues and eigenvectors of any positive integer power of A. For example, if A is an eigenvalue of A and x is a 
corresponding eigenvector, then 

A^x = A(Ax) = A(Ax) = A(Ax) = A(Ax) = A^x 

which shows not only that \^ is an eigenvalue of but that x is a corresponding eigenvector. In general, we have 
the following result. 

Note that diagonalizability is not a requirement in 
Theorem 5.2.4. 

THEOREM 5.2.4 

If A is an eigenvalue of a square matrix A and x is a corresponding eigenvector, and if k is any positive 
integer, then is an eigenvalue of and x is a corresponding eigenvector. 

Some problems that use this theorem are given in the exercises. 

Geometric and Algebraic Multiplicity 

Theorem 5.2.3 does not completely settle the diagonalizability question since it only guarantees that a square 
matrix with n distinct eigenvalues is diagonalizable, but does not preclude the possibility that there may exist 
diagonalizable matrices with fewer than n distinct eigenvalues. The following example shows that this is indeed the 
case. 

EXAMPLE 6 The Converse of Theorem 5.2.3 Is False < 

Consider the matrices 



"1 


0 0" 




'\ 


1 


0" 


0 


1 0 


and J = 


0 


1 


1 


0 


0 1 




0 


0 


1 



It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigenvalue, namely 

= 1 , and hence only one eigenspace. We leave it as an exercise for you to solve the characteristic 
equations 

CV-/)x = 0 and (AJ^/)x = 0 

with \ = \ and show that for / the eigenspace is three-dimensional (all of P-') and for Jit is 
one-dimensional, consisting of all scalar multiples of 



x = 



r 

0 

0 



This shows that the converse of Theorem 5.2.3 is false, since we have produced two 3 x 3 matrices 
with fewer than three distinct eigenvalues, one of which is diagonalizable and the other of which is 
not. 



A full excursion into the study of diagonalizability is left for more advanced courses, but we will touch on one 
theorem that is important to a fuller understanding of diagonalizability. It can be proved that if Ag is an eigenvalue 
of A, then the dimension of the eigenspace corresponding to Ag cannot exceed the number of times that A — Aq 
appears as a factor of the characteristic polynomial of A. For example, in Example 1 and Example 2 the 
characteristic polynomial is 

(A-l)(A-2)2 

Thus, the eigenspace corresponding to A = 1 is at most (hence exactly) one-dimensional, and the eigenspace 
corresponding to A = 2 is at most two-dimensional. In Example 1 the eigenspace corresponding to A = 2 actually 
had dimension 2, resulting in diagonalizability, but in Example 2 the eigenspace corresponding to A = 2 had only 
dimension 1, resulting in nondiagonalizability. 

There is some terminology that is related to these ideas. If Aq is an eigenvalue of an « x « matrix A, then the 
dimension of the eigenspace corresponding to Aq is called the geometric multiplicity of Ag, and the number of 
times that A — Aq appears as a factor in the characteristic polynomial of A is called the algebraic multiplicity of Ag 
The following theorem, which we state without proof, summarizes the preceding discussion. 



THEOREM 5.2.5 Geometric and Algebraic IVIultiplicity 

If ^ is a square matrix, then: 

(a) For every eigenvalue of A, the geometric multiplicity is less than or equal to the algebraic multiplicity. 

(b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is equal to the 
algebraic multiplicity. 



OPTIONAL 

We will complete this section with an optional proof of Theorem 5.2.2. 

Proof of Theorem 5.2.2 Let v\, V2, v^^ be eigenvectors of ^ corresponding to distinct eigenvalues 

Ai, A2, Aj^. We will assume that vj, V2, v/t are linearly dependent and obtain a contradiction. We can then 

conclude that vi, V2, - vj^ are linearly independent. 

Since an eigenvector is nonzero by definition, is linearly independent. Let r be the largest integer such that 

{vi, V2, Vy) is linearly independent. Since we are assuming that (vi , V2, - v^) is linearly dependent, r 
satisfies \<r<k' Moreover, by the definition of r, ( vi , V2, . - v^-|_i ) is linearly dependent. Thus, there are 
scalars ci, ^2, Cy^\, not all zero, such that 



CI VI + C2V2 + ... + Cr+iVr+l = 0 



(5) 



Multiplying both sides of 5 by ^ and using the fact that 

^vi = Aivi, Av2 = A2V2, Av;.+i = Ay+ivy+i 

we obtain 

ciAivi + C2A2V2 + ... + c^+iA^+lv^+l = 0 (6) 

If we now multiply both sides of 5 by A^_|_i and subtract the resulting equation from 6 we obtain 

ci(Ai - Ay+i)vi + C2(A2 - Ay+i)v2 + ... + Cy(A;. - Ay+i)vy = 0 
Since {vj, V2, v^) is a linearly independent set, this equation implies that 

ci(Ai-Ay+l)=C2(A2-A,+i)=... = c,(A,-Ay+l) = 0 

and since Ai, A2, ^r-\-\ ^^e assumed to be distinct, it follows that 

ci=C2 = ... = Cr = 0 (7) 

Substituting these values in 5 yields 

Since the eigenvector Vy_j_i is nonzero, it follows that 

Cr+\ = 0 (8) 
But equations 7 and 8 contradict the fact that cj, C2, (^r-\-l ^^^^ the proof is complete. 



Concept Review 

• Similarity transformation 

• Similarity invariant 

• Similar matrices 

• Diagonalizable matrix 

• Geometric multiplicity 

• Algebraic multiplicity 



Skills 

• Determine whether a square matrix A is diagonalizable. 

• Diagonalize a square matrix^. 

• Find powers of a matrix using similarity. 

• Find the geometric multiplicity and the algebraic multiplicity of an eigenvalue. 



Exercise Set 5.2 



In Exercises 1-4, show that A and B are not similar matrices. 
1. 



■A = 



1 1 

3 2 



1 0 
3 -2 



Answer: 

Possible reason: Determinants are different. 



•A = 



A = 



4 -1 

2 4 

1 2 3 
0 1 2 
0 0 1 



,B = 



B = 



4 1 
2 4 

19 0 

1.0 

0 0 1 



Answer: 

Possible reason: Ranks are different. 



4. 


"1 


0 


r 




"1 


1 


0" 


A = 


2 


0 


2 


,B = 


2 


2 


0 




3 


0 


3 




0 


1 


1 



9 o 

5. Let ^ be a 5 X 6 matrix with characteristic equation A(A— 1)(A— 2)=0. What are the possible dimensions 
for eigenspaces of ^? 

Answer: 

A = 0:1 or2; A=l:l; A = 2:l,2, or3 

6. Let 



A = 



4 0 1 
2 3 2 
1 0 4 



(a) Find the eigenvalues of A. 

(b) For each eigenvalue \, find the rank of the matrix \l — Jl. 

(c) Is A diagonalizable? Justify your conclusion. 

In Exercises 7-11, use the method of Exercise 6 to determine whether the matrix is diagonalizable. 

7. [2 0 
_1 2 

Answer: 

Not diagonalizable 



2 
1 

'3 
0 

0 



-3 
-1_ 

0 0' 
2 0 

1 2 



Answer: 

Not diagonalizable 

10. [-1 0 1 
-13 0 
-4 13 -1 



11. 



-1 0 

2 1 

0 3 

0 0 



1 

-1 
2 
3 



Answer: 



Not diagonalizable 

In Exercises 12-15, find a matrix P that diagonalizes ^, and compute P~^AP- 



n.._\-\A 12] 

^"[-20 \l\ 



Answer: 



14. 



i4 = 



15. 



i4 = 



1 1 
10 0 
0 1 1 
0 1 1 

2 0 -2 
0 3 0 
0 0 3 



Answer: 



-2 


0 


r 


; P-'^AP = 


'3 


0 


0" 


0 


1 


0 


0 


3 


0 


1 


0 


0 




0 


0 


2 



In Exercises 16-21, find the geometric and algebraic multiplicity of each eigenvalue of the matrix v4, and 
determine whether v4 is diagonalizable. If A is diagonalizable, then find a matrix P that diagonalizes A, and find 



16. 



17. 



A = 



19 -9 -6 
25 -11 -9 
17 -9 -4 

-1 4 -2 
-3 4 0 
-3 1 3 



Answer: 



1 2 1 
1 3 3 
1 3 4 



1 0 0 
0 2 0 
0 0 3 



18. 



19. 



A = 



Answer: 



1 0 0 

0 1 0 
-3 0 1 



, P-^AP: 



0 0 0 
0 0 0 
0 0 1 



20. 



A = 



21. 



A = 



-2 
0 
0 
0 

-2 

0 
0 
0 



0 0 0 

-2 0 0 

0 3 0 

0 1 3 



0 0 

■2 5 

0 3 

0 0 



0 
-5 

0 
3 



Answer: 



; P-^AP: 



-2 

0 
0 
0 



0 0 0 
-2 0 0 
0 3 0 
0 0 3 



22. Use the method of Example 5 to compute where 

-[-1 ^] 



23. Use the method of Example 5 to compute where 



A = 



-1 7 -1 

0 1 0 
0 15 -2 



Answer: 

-1 10237 -2047 
0 1 0 
0 10245 -2048 

24. In each part, compute the stated power of 





1 


-2 


8" 






0 


-1 


0 






0 


0 


-1 




A^^ (b) A-^^ 


(c) 


^2301 



25. Find if n is a positive integer and 



Answer: 



"1 


1 


r 


"r 


0 


0 ' 


2 


0 




0 


3" 


0 


1 


-1 


1 


0 


0 


4« 



3 


-1 


0 


-1 


2 


-1 


0 


-1 


3 


1 


1 


r 


6 


3 


6 


1 


0 


1 


2 




"2 


1 


1 


1 


3 


3 


3 



26. Let 



Show that 

(a) A is diagonahzable if (i3 — li)^ + Abe > 0. 



(d) ^ 



-2301 



(b) A is not diagonahzable if (a — laf) + 4ic < 0. 

[Hint: See Exercise 19 of Section 5.1.] 

27. In the case where the matrix A in Exercise 26 is diagonahzable, find a matrix P that diagonalizes A. [Hint: See 
Exercise 20 of Section 5.1.] 



Answer: 



On possibility is P = 



-6 -6 



where Ai and A2 are as in Exercise 20 of Section 5.1. 



28. Prove that similar matrices have the same rank. 

29. Prove that similar matrices have the same nullity. 



Prove that similar matrices have the same trace. 

Prove that if A is diagonalizable, then so is for every positive integer k. 

Prove that if ^ is a diagonahzable matrix, then the rank of A is the number of nonzero eigenvalues of A. 
Suppose that the characteristic polynomial of some matrix A is found to be p (A) = (A — 1) (A — 3) (A — 4) . 
In each part, answer the question and explain your reasoning. 

(a) What can you say about the dimensions of the eigenspaces of ^? 

(b) What can you say about the dimensions of the eigenspaces if you know that A is diagonalizable? 

(c) If (vj, V2, V3) is a linearly independent set of eigenvectors of A all of which correspond to the same 
eigenvalue of A, what can you say about the eigenvalue? 

Answer: 

A = 1 : dimension =1; A = 3: dimension <2; A = 4: dimension <3 

(b) Dimensions will be exactly 1, 2, and 3. 

(c) A = 4 

34. This problem will lead you through a proof of the fact that the algebraic multiplicity of an eigenvalue of an 
}2xn matrix A is greater than or equal to the geometric multiplicity. For this purpose, assume that Ag is an 
eigenvalue with geometric multiplicity k. 

(a) Prove that there is a basis B= {uj , U2, . . ) for in which the first k vectors of B form a basis for the 
eigenspace corresponding to Aq. 

(b) Let P be the matrix having the vectors in B as columns. Prove that the product j^p can be expressed as 



[Hint: Compare the first k column vectors on both sides.] 
(c) Use the result in part (b) to prove that A is similar to 




Aq/^ X 
0 Y 



and hence that A and C have the same characteristic polynomial. 

(d) By considering det(XI — C), prove that the characteristic polynomial of C (and hence A) contains the 

factor (A — Aq) at least k times, thereby proving that the algebraic multiplicity of Aq is greater than or equal 
to the geometric multiplicity k. 

True-False Exercises 



In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) Every square matrix is similar to itself. 
Answer: 

True 

(b) IfA.B, and C are matrices for which A is similar to B and B is similar to C, then A is similar to C. 



Answer: 

True 

(c) If A and B are similar invertible matrices, then and are similar. 
Answer: 

True 

(d) If A is diagonalizable, then there is a unique matrix P such that P~^AP is diagonal. 
Answer: 

False 

(e) If v4 is diagonalizable and invertible, then is diagonalizable. 
Answer: 

True 

(f) If A is diagonalizable, then is diagonalizable. 
Answer: 

True 

(g) If there is a basis for consisting of eigenvectors of an « x « matrix A, then A is diagonalizable. 
Answer: 

True 

(h) If every eigenvalue of a matrix^ has algebraic multiplicity 1, thenv4 is diagonalizable. 
Answer: 

True 
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5.3 Complex Vector Spaces 



Because the characteristic equation of any square matrix can have complex solutions, the notions of complex eigenvalues and 
eigenvectors arise naturally, even within the context of matrices with real entries. In this section we will discuss this idea and 
apply our results to study symmetric matrices in more detail. A review of the essentials of complex numbers appears in the 
back of this text. 



Review of Complex Numbers 

Recall that if ? = <3 + ij is a complex number, then: 

• Re(z) = a and Im(z) =b are called the real part of z and the imaginary part of z, respectively, 

• jz| = ^a^ -\~b^ is called the modulus (or absolute value) of z, 

• ? = — is called the complex conjugate of z, 

• the angle n in Figure 5.3.1 is called an argument of z, 

• Re(z) = |z| cos ^ 

• Im(z) = |z| sin ^ 

• z = |z|(cos ib + isin n) is called the polar form of z. 

c = 6/ + hi 




Figure 5.3.1 



Complex Eigenvalues 



In Formula 3 of Section 5.1 we observed that the characteristic equation of a general n x n matrix^ has the form 

in which the highest power of ,\ has a coefficient of 1. Up to now we have limited our discussion to matrices in which the 
solutions of 1 are real numbers. However, it is possible for the characteristic equation of a matrix^ with real entries to have 
imaginary solutions; for example, the characteristic equation of the matrix 

A= -2 



(1) 



IS 



A + 2 1 
-5 A-2 



= A^+1 = 0 



which has the imaginary solutions ,\ = j and X= — j. To deal with this case we will need to explore the notion of a complex 
vector space and some related ideas. 



Vectors in C" 



A vector space in which scalars are allowed to be complex numbers is called a complex vector space. In this section we will 
be concerned only with the following complex generalization of the real vector space R^. 



DEFINITION 1 

If is a positive integer, then a complex n-tuple is a sequence of n complex numbers (v i , V2, - - v„) . The set of all 
complex /2-tuples is called complex n-space and is denoted by C". Scalars are complex numbers, and the operations 
of addition, subtraction, and scalar multiplication are performed componentwise. 



The terminology used for /z-tuples of real numbers applies to complex /z-tuples without change. Thus, if vj, V2, - - v„ are 
complex numbers, then we call v = (vj, V2, v„) a vector in C" and vj, V2, v„ its components. Some examples of 
vectors in are 

u=(l+z, -4z,34 2j), v=(0,z,5), w= ^6 - /2j, 9 + in j 

Every vector 

v= (vi, V2,-.-, v„) = {a\ ^b\h <32 + *2^--, -^m + ^mO 

in C" can be split into real and imaginary parts as 

which we also denote as 

V = Re (v) + i Im(v) 

where 

Re(v) = {a\,a2...., ay,) and Iin(v) = (i?!, iJ2. ^n) 

The vector 

v= (vi, V2,.--, v„) = {ax -bxha2-b2h...,ay,-by,i) 
is called the complex conjugate of v and can be expressed in terms of Re(v) and Im(v) as 

v={a\,a2....,ay,) - 62, = Re(v) -i Im(v) (2) 

It follows that the vectors in can be viewed as those vectors in C" whose imaginary part is zero; or stated another way, a 
vector V in C" is in if and only if v = v. 

In this section we will also need to consider matrices with complex entries, so henceforth we will call a matrix A a real matrix 
if its entries are required to be real numbers and a complex matrix if its entries are allowed to be complex numbers. The 
standard operations on real matrices carry over to complex matrices without change, and all of the familiar properties of 
matrices continue to hold. 

If ^ is a complex matrix, then Re(^) and Im(^) are the matrices formed from the real and imaginary parts of the entries of A, 
and ^4 is the matrix formed by taking the complex conjugate of each entry in^. 

EXAMPLE 1 Real and Imaginary Parts of Vectors and Matrices A 

Let 

v= (3 + 2, -2j, 5) and A = 



4 6-2z 



Then 



V = (3 - 3, 2i, 5), Re(v) = (3, 0, 5), Im(v) = (1, - 2, 0) 



A = 



<kt(A) = 



1 — : i 
4 6+ 2i 

1 + j -J 
4 6-2i 



Re(^) = 



1 0 
4 6 



Im(^) = 



1 -1 
0 -2 



= (H-i)(6-2i)-(-i)(4) = 8 + 8j 



Algebraic Properties of the Complex Conjugate 

The next two theorems hst some properties of complex vectors and matrices that we will need in this section. Some of the 
proofs are given as exercises. 

a 

THEOREM 5.3.1 

If u and V are vectors in C", and if A: is a scalar, then: 

(a) 3 = 1 

(b) ha = /m 

u4 v = u + v 

(d) u^=a-v 

i3 

THEOREM 5.3.2 

IfAisaUffixk complex matrix and B isakxfi complex matrix, then: 

(a) A^A 

(b) \^)={Af 

(c) AB = A B 



The Complex Euclidean Inner Product 

The following definition extends the notions of dot product and norm to C". 
DEFINITION 2 

If u = (ui,U2,..; u„) and v = (v i , V2, . . ., v„) are vectors in C", then the complex Euclidean inner product of of u 
and V (also called the complex dot product) is denoted by u • v and is defined as 



u- v = 2/ivi +2^2V2 + --- + "mV?2 (3) 

We also define the Euclidean norm on C" to be 

||v|| = {7^= /|vi|^-f |v2l^+-..+ |v„|^ (4) 

L 

As in the real case, we call v a unit vector in C" if || v|| = 1 , and we say two vectors u and v are orthogonal if u • v = 0- 

The complex conjugates in 3 ensure that ||v|| is a real 
number, for without them the quantity v • v iii 4 might 
be imaginary. 

EXAMPLE 2 Complex Euclidean Inner Product and Norm A 

Find u • V? V • u? ||u||, and ||v|| for the vectors 

Q= (1 3-i) and v=(l+j, 2,4i) 

Solution 

u-v=(l+0(ra) + i(2)+(3-0(4J) = (l+i)(l-i) + 2i + (3-0(-40= -2-lOi 
V u= (1+0(1+7) + 2(7) + (40 (3^) = (1 +0(1 -0-2i + 4i(3 + 0= - 2 + lOi 



J 



Hull = /|l+^|2^|i|2^|3-i|2= 1/2+1 + 10= /TI 

IMI = /|l+i|2 + |2p^|4zp = v/2+4 + 16 = {22 



Recall from Table 1 of Section 3.2 that if u and v are column vectors in R^, then their dot product can be expressed as 
The analogous formulas in C" are (verify) 



T T 
u-v = u v = v u 



U'V = uv = vu (5) 



Example 2 reveals a major difference between the dot product on and the complex dot product on C". For the dot product 
on we always have v • u = u • v (the symmetry property), but for the complex dot product the corresponding relationship is 
given by u • V = v • u, which is called its antisymmetry property. The following theorem is an analog of Theorem 3.2.2. 

J] □ 

THEOREM 5.3.3 

If u, V, and w are vectors in C", and if ^ is a scalar, then the complex Euclidean inner product has the following 
properties: 

(a) u • V = iTni [Antisymmetry property J 

(I,) a • (v + w) = u • V + u • w [Distributive property] 

(c) A:(u • v) = {hi) ' V [Homogeneity property] 



(d) w. ' kv = k{ii ' y) [ Antihomogeneity propeitj ] 

(e) V V > 0 and v 'Y = Oif and only if v = 0 . [Positivity property] 



Parts (c) and {d) of this theorem state that a scalar multiplying a complex Euclidean inner product can be regrouped with the 
first vector, but to regroup it with the second vector you must first take its complex conjugate. We will prove part {d), and 
leave the others as exercises. 

Proof (d) 

k{xL ' v) = ^(v • u) = k{Y ' u) = k{Y ' u) = (kY^^ • u = u • (kv^^ 
To complete the proof substitute k for k and use the fact that k = k' 



Vector Concepts in 

Except for the use of complex scalars, the notions of linear combination, linear independence, subspace, spanning, basis, and 
dimension carry over without change to C". 

Is /J" a subspace of C"? Explain. 



Eigenvalues and eigenvectors are defined for complex matrices exactly as for real matrices. If A is an ^ x n matrix with 
complex entries, then the complex roots of the characteristic equation det(A/ — -d) = 0 are called complex eigenvalues of A. 
As in the real case, A is a complex eigenvalue of A if and only if there exists a nonzero vector x in C" such that ^ — ,\x. 
Each such x is called a complex eigenvector of A corresponding to X. The complex eigenvectors of A corresponding to \ are 
the nonzero solutions of the linear system (A/ — j4)x = 0, and the set of all such solutions is a subspace of C", called the 
eigenspace of A corresponding to X. 

The following theorem states that if a real matrix has complex eigenvalues, then those eigenvalues and their corresponding 
eigenvectors occur in conjugate pairs. 



THEOREM 5.3.4 

IfX is an eigenvalue of a real nxn matrix A, and if x is a corresponding eigenvector, then A is also an eigenvalue of A, 
and X is a corresponding eigenvector. 



Proof Since X is an eigenvalue of A and x is a corresponding eigenvector, we have 

^=Ax = ^ (6) 

However, A = since A has real entries, so it follows from part (c) of Theorem 5.3.2 that 

'^=JS. = JS. (7) 

Equations 6 and 7 together imply that 



in which x ^ 0 (why?); this tells us that A is an eigenvalue of A and x is a corresponding eigenvector. 



EXAMPLE 3 Complex Eigenvalues and Eigenvectors M 

Find the eigenvalues and bases for the eigenspaces of 



A = 



-2 -1 

5 2 



Solution The characteristic polynomial of A is 

A-f 2 1 
-5 A-2 



= A^ + l = (A-0(A + i) 



so the eigenvalues of A are \ = j and A = — Note that these eigenvalues are complex conjugates, as 
guaranteed by Theorem 5.3.4. 



To fmd the eigenvectors we must solve the system 



with X = i and then with X= —j. With X — j, this system becomes 

'j + 2 1 1 r^: 
-5 7-2] ^2 



H] 



We could solve this system by reducing the augmented matrix 

"j + 2 1 0 
-5 j-2 0 



(8) 



(9) 



to reduced row echelon form by Gauss- Jordan elimination, though the complex arithmetic is somewhat tedious. 
A simpler procedure here is first to observe that the reduced row echelon form of 9 must have a row of zeros 
because 8 has nontrivial solutions. This being the case, each row of 9 must be a scalar multiple of the other, and 
hence the first row can be made into a row of zeros by adding a suitable multiple of the second row to it. 
Accordingly, we can simply set the entries in the first row to zero, then interchange the rows, and then multiply 
the new first row by — y to obtain the reduced row echelon form 

1-1. 0- 



0 



0 



Thus, a general solution of the system is 



This tells us that the eigenspace corresponding to A = i is one-dimensional and consists of all complex scalar 
multiples of the basis vector 



x = 



1 



(10) 



As a check, let us confirm that ^ = jx- We obtain 



-2 -1 

5 2 



1 



-2f-| + lil-l 



+ 2 



5 5 
i 



= JX 



We could find a basis for the eigenspace corresponding to A = 



_ J in a similar way, but the work is unnecessary. 



since Theorem 5.3.4 implies that 



x = 



5 5 
1 



(11) 



must be a basis for this eigenspace. The following computations confirm that x is an eigenvector of A 
corresponding to ,\ = —i- 



5 5 



L 5 2j 



! +2 



.1+2. 



= — IX 



Since a number of our subsequent examples will involve 2x2 matrices with real entries, it will be useful to discuss some 
general results about the eigenvalues of such matrices. Observe first that the characteristic polynomial of the matrix 

'a b' 



A = 



c d 



IS 



det(A/-^) = 



X-a -b 
-c X-d 



= (X-a)(X-d)-bc = X^-ia + d)X + (ad - be) 



We can express this in terms of the trace and determinant of A as 

det(A/ -A)=X'^- tr(A)X + dtt{A) 
from which it follows that the characteristic equation of A is 

A^-tr(^)A+det(^) = 0 

Now recall from algebra that if ax^ + 6x + c = 0 is a quadratic equation with real coefficients, then the discriminant 
b^ — 4ac determines the nature of the roots: 

9 

b — 4ac > 0 [ T^vo distinct real roots ] 
b —4ac = 0 [One repeated real root] 
b — 4ac < 0 [ T^?lro conjugate iinaginaiy roots ] 

Applying this to 13 with ^ = 1, i = — tr(A), and c = det(A) yields the following theorem. 



(12) 



(13) 



Olga Taussky-Todd (1906-1995) 



Historical Note Olga Taussky-Todd was one of the pioneering women in matrix analysis and the first woman 
appointed to the faculty at the California Institute of Technology. She worked at the National Physical Laboratory in 
London during World War II, where she was assigned to study flutter in supersonic aircraft. While there, she realized 
that some results about the eigenvalues of a certain 6x6 complex matrix could be used to answer key questions about 
the flutter problem that would otherwise have required laborious calculation. After World War II Olga Taussky-Todd 
continued her work on matrix-related subjects and helped to draw many known but disparate results about matrices 
into the coherent subject that we now call matrix theory. 
[Image: Courtesy of the Archives, California Institute of Technology] 



THEOREM 5.3.5 

If^ 

is a 2 X 2 matrix with real entries, then the characteristic equation of ^ is A — tr(-i4)A I det(-4) = 0 and 

(a) A has two distinct real eigenvalues if tr(^)'^ — 4 det(^) > 0; 

(b) A has one repeated real eigenvalue if tr(A) — 4 det(j4) = 0; 

(c) A has two complex conjugate eigenvalues if tr(^) — 4 <iti{A) < 0. 

EXAMPLE 4 Eigenvalues of a 2 X 2 Matrix M 

part, use Formula 13 for the characteristic equation to flnd the eigenvalues of 
Solution 

(a) We have tr(^) = 7 and det(^) = 12, so the characteristic equation of A is 

a2_7A+12 = 0 

Factoring yields (A — 4) (A — 3) = 0, so the eigenvalues of A are A = 4 and A = 3- 

(b) We have tr{A) = 2 and det(-<4) = 1, so the characteristic equation of A is 

A^-2A+1 = 0 

Factoring this equation yields (A — 1) = 0, so A = 1 is the only eigenvalue of ^; it has algebraic 
multiplicity 2. 

(c) We have tr(A) = 4 and det(-4) = 13, so the characteristic equation of A is 

A2_4A-f 13 = 0 
Solving this equation by the quadratic formula yields 



X= -'-^^-z = - =2±3i 

Thus, the eigenvalues of A are A = 2 -h 3i and A = 2 — Sz- 



Symmetric Matrices Have Real Eigenvalues 



Our next result, which is concerned with the eigenvalues of real symmetric matrices, is important in a wide variety of 
applications. The key to its proof is to think of a real symmetric matrix as a complex matrix whose entries have an imaginary 
part of zero. 

THEOREM 5.3.6 

If ^ is a real symmetric matrix, then^ has real eigenvalues. 



Proof Suppose that A is an eigenvalue of A and x is a corresponding eigenvector, where we allow for the possibility that X is 
complex and x is in C". Thus, 

where x 0- If we multiply both sides of this equation by and use the fact that 

= x^(Ax) = a(x^x) = A(x • x) = A||x||^ 

then we obtain 



llx||2 



Since the denominator in this expression is real, we can prove that X is real by showing that 



x'^Ax = x'^Ax (14) 
But, A is symmetric and has real entries, so it follows from the second equality in 14 and properties of the conjugate that 



x^Ax. = x^iix = = J = (Ax) = ( Ajc) ■^x = x^j4 ^x = x^^ 



A Geometric Interpretation of Complex Eigenvalues 

The following theorem is the key to understanding the geometric significance of complex eigenvalues of real 2x2 matrices. 

'J J 

THEOREM 5.3.7 

The eigenvalues of the real matrix 



C = 



a -b 

b a 



(15) 



are A = ^2 ± ii- If <3 and b are not both zero, then this matrix can be factored as 

cos^ — sin^ 
sin^ cos^ 

where cp is the angle from the positive x-axis to the ray that joins the origin to the point {a,b) (Figure 5.3.2). 



'a -b 




-|A| 0 1 


b a 




0 |A|J 



(16) 




Figure 5.3.2 



Geometrically, this theorem states that multiplication by a matrix of form 1 5 can be viewed as a rotation through the angle cp 
followed by a scaling with factor |.\| (Figure 5.3.3). 

Scaled //^^^ 

Rotated 




Figure 5.3.3 



9 9 

Proof The characteristic equation of C is (A — tat) -f- 6 = 0 (verify), from which it follows that the eigenvalues of C are 

\ = a I bi- Assuming that a and b are not both zero, let cp be the angle from the positive x-axis to the ray that joins the origin 
to the point (a, b). The angle (p is an argument of the eigenvalue A = -f iz, so we see from Figure 5.3.2 that 

a = |A|cos 6 and b — |A|sin ^ 

It follows from this that the matrix in 15 can be written as 



a ~b 






0 1 


_b a 




0 





a 


b ' 














b 


a 




0 




1^1 







1^1 



cos^ 



— sin^ 
cos4 



The following theorem, whose proof is considered in the exercises, shows that every real 2 x 2 matrix with complex 
eigenvalues is similar to a matrix of form 15. 



THEOREM 5.3.8 



Let ^ be a real 2x2 matrix with complex eigenvalues \ = a ± bi (where ^ 0)- If x is an eigenvector of A 
corresponding to X = a — bi^ then the matrix P = |^Re (x) Im(x) J is invertible and 



A = P 



b a 



(17) 



EXAMPLE 5 A Matrix Factorization Using Complex Eigenvalues ^ 

Factor the matrix in Example 3 into form 17 using the eigenvalue A = — j and the corresponding eigenvector 
that was given in 1 1 . 



Solution For consistency with the notation in Theorem 5.3.8, let us denote the eigenvector in 1 1 that 
corresponds to A = — z by x (rather than x as before). For this X and x we have 





2' 




1 " 


fl = 0, b = \, Re(x) = 


5 


, Im(x) = 


5 




1 




0 



Thus, 



so A can be factored in form 17 as 



F=[Re(x) Im(x)] = 



-2 -1 

5 2 



2 _i 
"5 5 
1 0 





'_2 


1 " 








5 






1 


0 




'0 


-1" 


0 


r 


1 


0_ 


-5 


-2_ 



You may want to confirm this by multiplying out the right side. 



A Geometric Interpretation of Tlieorem 5.3.8 

To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right side of 16 by and R,- „ respectively, 
and then use 16 to rewrite 17 as 

If we now view P as the transition matrix from the basis B= {Re (x) , Im(x) ) to the standard basis, then 1 8 tells us that 
computing a product Axq can be broken down into a three-step process: 

Step 1 Map X|] from standard coordinates into ^-coordinates by forming the product p~^xo- 
Step 2 Rotate and scale the vector p ~^xq forming the product SR,. P "^xq- 
Step 3 Map the rotated and scaled vector back to standard coordinates to obtain = PSR^~^x{}- 



|A| 0 


cos^ — sin^ 


0 |A| 





(18) 



Power Sequences 

There are many problems in which one is interested in how successive applications of a matrix transformation affect a specific 
vector. For example, if A is the standard matrix for an operator on /J" and xg is some fixed vector in R^'\ then one might be 
interested in the behavior of the power sequence 

XQ, -4X0, ^^XQ, ^^XQ, ... 



For example, if 



A = 



1 1 

2 4 

3 li 
"5 10 



and XQ = 



then with the help of a computer or calculator one can show that the first four terms in the power sequence are 

With the help of MATLAB or a computer algebra system one can show that if the first 100 terms are plotted as ordered pairs 
(x, y), then the points move along the elliptical path shown in Figure 5.3Aa. 



"1" 


, ^0 = 


1.25" 




1.0" 




0.35" 


_1_ 


0.5_ 




_-0.2_ 




_-0.82_ 



x„ = (l.l) 



4^ 



•••Vv, 



-I 



! 



I/- 
/ 




(h) 

Figure 5.3.4 



(c) 



To understand why the points move along an elliptical path, we will need to examine the eigenvalues and eigenvectors of A. 
We leave it for you to show that the eigenvalues of A are A = y ± and that the corresponding eigenvectors are 

Ai = |-|j: vi=(l + i,lj and A2 = | + |j: V2=(^-i, l] 
If we take A = Ai = ^ — -ji and x = vi = ^2"^"'' ljinl7 and use the fact that |A| = 1 , then we obtain the factorization 



3 
4 
li 
10 



1 > 

1 0 



(19) 



A = P 

where /?p is a rotation about the origin through the angle 9 whose tangent is 

^ cosrf> 4/5 4 V 4 J 

The matrix P in 19 is the transition matrix from the basis 

5=(Re(x).Im(x)) =1(1 lj,(l,0)| 

to the standard basis, and is the transition matrix from the standard basis to the basis B (Figure 5.3.5). Next, observe that 
if is a positive integer, then 19 implies that 

^"xo = (PR^pP ) "xo = "*xo 

so the product ^"xq can be computed by first mapping xg into the point p ~^xq ^-coordinates, then multiplying by R^, to 
rotate this point about the origin through the angle nn, and then multiplying R^P~^x(} by P to map the resulting point back to 



standard coordinates. We can now see what is happening geometrically: In 5-coordinates each successive multiplication by A 
causes the point P~^xq to advance through an angle (p, thereby tracing a circular orbit about the origin. However, the basis B 
is skewed (not orthogonal), so when the points on the circular orbit are transformed back to standard coordinates, the effect is 
to distort the circular orbit into the elliptical orbit traced by ^"xq (Figure 5.3.4^). Here are the computations for the first step 
(successive steps are illustrated in Figure 5.3.4c): 



3 
4 

li 

10 



0 1 

' 4 



^ 1 



1 
2 

1 0 



4 _3 

5 5 

1 1 

5 5 



[xQ is mapped to B — coordinates . ] 



^ 1 



i 

2 

1 0 



The point [1, — Jis rotated through the angle 



^The point j^^, iji 



The point [—, 1 Jis mapped to standard coordinates 




Im(x) (1,0) 
Figure 5.3.5 



Concept Review 

• Real part of z 

• Imaginary part of z 

• Modulus of z 

• Complex conjugate of z 

• Argument of z 

• Polar form of z 

• Complex vector space 

• Complex n-tuple 

• Complex fz-space 

• Real matrix 

• Complex matrix 

• Complex Euclidean inner product 

• Euclidean norm on C" 



• Antisymmetry property 

• Complex eigenvalue 

• Complex eigenvector 

• Eigenspace in C" 

• Discriminant 
Skills 

• Find the real part, imaginary part, and complex conjugate of a complex matrix or vector. 

• Find the determinant of a complex matrix. 

• Find complex inner products and norms of complex vectors. 

• Find the eigenvalues and bases for the eigenspaces of complex matrices. 

• Factor a 2 x 2 i^^al matrix with complex eigenvalues into a product of a scaling matrix and a rotation matrix. 



Exercise Set 5.3 



In Exercises 1-2, fmd ii, Re(u), Im(u), and ||u||. 
1.11= (2- 4i, 1 -f 0 
Answer: 

u=(2-fj, -4j, 1-0, Re (u) = (2,0, 1), Im (u) = ( « 1, 4, 1), ||u|| = 

2. u= (6,1+4^,6-20 

In Exercises 3^, show that u, v, and k satisfy Theorem 5.3.1. 

3. u= (3-4i, 2 + 2, -60, v= (1 +i, 2-j,4), A: = 2 
4 u=(6, l+4i,6-20,v=(4,3 + 2^z-3),i= 

5. Solve the equation ?x — 3v = u for x, where u and v are the vectors in Exercise 3. 
Answer: 

X = (7 - 6j, - 4 - 8j, 6 - 120 

6. Solve the equation (1 + 0^ + 2u = v for x, where u and v are the vectors in Exercise 4. 

In Exercises 7-8, fmd A, Re {A) , Im(^) , det(^) , and \x{A) . 
7. 



-5i 4 

2-i l + 5z 



Answer: 
A = 



8. 



A = 



5i 4 

2 \ i 1 - 5j 

4i 2-3i 

2 + 3i 1 



, Re (A) = 



0 4 

2 1 



,'im(A) = 



-5 0 

-1 5 



, det(^) = 17-i, tr(^) = l 



9. Let A be the matrix given in Exercise 7, and let B be the matrix 

'1 -I 



B = 



2i 



Confirm that these matrices have the properties stated in Theorem 5.3.2. 
10. Let A be the matrix given in Exercise 8, and let B be the matrix 

5i 

l-4i_ 

Confirm that these matrices have the properties stated in Theorem 5.3.2. 



B = 



In Exercises 1 1-12, compute q • v? u • v • w? show that the vectors satisfy Formula 5 and parts ( a), ( b), and ( c) 

of Theorem 5.3.3. 

11. Q=(i,2i,3), v=(4, -2i, w=(2-i,2i,5 + 3i), fc = 2i 
Answer: 

u-v= —1 I u-w=18— 7?, vw=\2~\-6i 

12. u=(l n,4,30, v= (3, -4j, 2 + 3x), w= (1 - i, 4z, 4 - 50, k=\+i 



13. Compute (u • v) — w u for the vectors u, v, and w in Exercise 1 1 . 

Answer: 
-ll-14i 



14. Compute (in • wj + (||u||v) • u for the vectors u, v, and w in Exercise 12. 
In Exercises 15-18, find the eigenvalues and bases for the eigenspaces of ^. 
15. 



.1 = 



4 -5 
1 0 



Answer: 



"H 1] 



Answer: 



18. 



-[-3 I] 



In Exercises 19-22, each matrix C has form 15. Theorem 5.3.7 implies that C is the product of a scaling matrix with factor 
|A| and a rotation matrix with angle (p. Find |. \| and (p for which — ir < 0 < flr. 



19. 



Answer: 

|A| = /2.0 = J 

■C=\ ' 
[-5 Oj 



21. 



C = 



1 ^] 



Answer: 

|A| = 2.^=-| 



22. 



C = 



-f2 f2 



In Exercises 23-26, find an invertible matrix P and a matrix C of form 1 5 such that ji = PCP~^ ■ 
23. 



Answer 
P 



24 

25, 



-2 

2 

4 - 



=[ 

=[-3 3] 



26. 



Answer: 
P = 
A = 



1 


-1 


-1 


0 


5 ■ 


-2" 


1 


3 



27. Find all complex scalars k, if any, for which u and v are orthogonal in Q^. 

(a) ^ = (2^ V = (i, 6j, 

(b) u=(i,*, 1+0, v=(l, -1, 



Answer: 



(a) k= -|i 



(b) None 

28. Show that if ^ is a real m x » matrix and x is a column vector in C", then Re (Ac) = j4(Re(x)) and lm{Ax:) = AQm(x)). 

29. The matrices 



= I] "0} "'=[0 -?] 



called Pauli spin matrices, are used in quantum mechanics to study particle spin. The Dirac matrices, which are also used 
in quantum mechanics, are expressed in terms of the Pauli spin matrices and the 2 x 2 identity matrix as 

'I2 0 
0 -I2 

0 a2 
02 0 



& = 





" 0 ai' 




ai 0 




" 0 0-3" 








<J3 0 



(a) Show that = =Q^. 

(b) Matrices A and B for which ji£ = _ BJ[ are said to be anticommutative. Show that the Dirac matrices are 
anticommutative. 

30. If ^ is a real scalar and v is a vector in R^, then Theorem 3.2.1 states that \\kv\\ = |A:|||v|| . Is this relationship also true if k 
is a complex scalar and v is a vector in C"? Justify your answer. 

31. Prove part ( c) of Theorem 5.3.1. 

32. Prove Theorem 5.3.2. 

33. Prove that if u and v are vectors in C", then 

u-v= i||u + v||2-i||u-v||2 

+ |-||ii + jv||2-i-||u-jv|l^ 

34. It follows from Theorem 5.3.7 that the eigenvalues of the rotation matrix 

cos^ —sin 



sin^ cos 



are A = cos6 ± isin^. Prove that if x is an eigenvector corresponding to either eigenvalue, then Re(x) and Im(x) are 
orthogonal and have the same length. [Note: This implies that P = [Re(x)Im(x) ] is a real scalar multiple of an 
orthogonal matrix.] 

35. The two parts of this exercise lead you through a proof of Theorem 5.3.8. 

(a) For notational simplicity, let 

M — ^ ~^ 
b a 

and let u = Re(x) and v = Im(x), so P = [u|v] . Show that the relationship ^ = Ax implies that 

As. = (an + iv) -H i( — iu + av) 

and then equate real and imaginary parts in this equation to show that 

AP = [Au\Av] = [au-^bv\-bu-^ av] = PM 

(b) Show that P is invertible, thereby completing the proof, since the result in part (a) implies that J[ = PMP~^ • [Hint: If 
P is not invertible, then one of its column vectors is a real scalar multiple of the other, say v = c:u- Substitute this into 
the equations An — cm \ bv and = — bn \ av obtained in part (a), and show that (1 -h c jbn = 0. Finally, show 
that this leads to a contradiction, thereby proving that P is invertible.] 

36. In this problem you will prove the complex analog of the Cauchy-Schwarz inequality, 
(a) Prove: If ^ is a complex number, and u and v are vectors in C", then 



(u — kv) ' (u — kv) = u • u — k(u • v) — ^(u • v) + kk{v • v) 



(b) Use the result in part (a) to prove that 



0 < u • u - A:(u • v) - i(u • v) + kk(v • v) 

(c) Take t = (u • v) / (v • v) in part (b) to prove that 

|u-v|<||u|| IMI 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 



(a) There is a real 5x5 matrix with no real eigenvalues. 



Answer: 



False 

(b) The eigenvalues of a 2 x 2 complex matrix are the solutions of the equation A — tr(^)A + det(^) = 0. 
Answer: 

True 

(c) Matrices that have the same complex eigenvalues with the same algebraic multiplicities have the same trace. 
Answer: 

False 

(d) If A- is a complex eigenvalue of a real matrix A with a corresponding complex eigenvector v, then A is a complex 
eigenvalue of A and ? is a complex eigenvector of A corresponding to A- 

Answer: 

True 

(e) Every eigenvalue of a complex symmetric matrix is real. 
Answer: 

False 

(f) If a 2 X 2 real matrix A has complex eigenvalues and xq is a vector in g^, then the vectors xg, Axq, A^xq -^"xg, 

on an ellipse. 

Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



5.4 Differential Equations 



Many laws of physics, chemistry, biology, engineering, and economics are described in terms of "differential 
equations" — that is, equations involving functions and their derivatives. In this section we will illustrate one way in 
which linear algebra, eigenvalues and eigenvectors can be applied to solving systems of differential equations. 
Calculus is a prerequisite for this section. 



Terminology 

Recall from calculus that a differential equation is an equation involving unknown functions and their derivatives. 
The order of a differential equation is the order of the highest derivative it contains. The simplest differential 
equations are the first-order equations of the form 

y' = ay (1) 

where y = f (x) is an unknown differentiable function to be determined, =dy I dx is its derivative, and a is a 
constant. As with most differential equations, this equation has infinitely many solutions; they are the functions of the 
form 

y = ce^^ (2) 

where c is an arbitrary constant. That every function of this form is a solution of 1 follows from the computation 

y^ =cae^^ = ay 

and that these are the only solution is shown in the exercises. Accordingly, we call 2 the general solution of 1. As an 
example, the general solution of the differential equation y^ = 5y is 

y^ce^"" (3) 

Often, a physical problem that leads to a differential equation imposes some conditions that enable us to isolate one 
particular solution from the general solution. For example, if we require that solution 3 of the equation y^ = 5y 

satisfy the added condition 

yC0)=6 (4) 

(that is, y = 6 when x = 0)? then on substituting these values in 3, we obtain 6 = ce^ = c, from which we conclude 
that 

y = 6e^' 

is the only solution y' =5y that satisfies 4. 

A condition such as 4, which specifies the value of the general solution at a point is called an initial condition, and 
the problem of solving a differential equation subject to an initial condition is called an initial-value problem. 



First-Order Linear Systems 



In this section we will be concerned with solving systems of differential equations of the form 



y 1 = ^nyi + a\iy2 



(5) 



y y, = an\y\ + ^yOyi +--■+ ^r^n 

whereyj = y j (x), y2 = f '){^) >'m = /m(^) are functions to be determined, and the ^j/s are constants. In 

matrix notation, 5 can be written as 



y'2 



y\ 



or, more briefly as 



A system of differential equations of form 5 is 
called a first-order linear system. 



•at 2m 

<3nn 



yn 



y' = Ay 

where the notation y ' denotes the vector obtained by differentiating each component of y. 



(6) 



EXAMPLE 1 Solution of a Linear System witli Initial Conditions 

(a) Write the following system in matrix form: 

/l = 

y'2 = -2^2 

y'3 = 

(b) Solve the system. 

(c) Find a solution of the system that satisfies the initial conditions yj (0) = L y2(0) = 
y3(0)= -2. 



(V) 



Solution 

(a) 



or 



y'2 
y'3 



"3 


0 


0' 


>r 


0 


-2 


0 


y2 


0 


0 


5 


73 



y = 



3 0 0 
0-20 

0 0 5 



(8) 



(9) 



(b) Because each equation in 7 involves only one unknown function, we can solve the equations 
individually. It follows from 2 that these solutions are 



or, in matrix notation. 



y = 



(c) From the given initial conditions, we obtain 

1 = 7l(0)=<:ie° = ci 

so the solution satisfying these conditions is 







cie 
















■yf 




c\e 




72 






(10) 


73 









or, in matrix notation. 



Y = 



yi 

73 



4^ 



-2;f 



-2e 



5;r 



Solution by Diagonalization 

What made the system in Example 1 easy to solve was the fact that each equation involved only one of the unknown 
functions, so its matrix formulation, y' = Ay , had a diagonal coefficient matrix A [Formula 9]. A more complicated 
situation occurs when some or all of the equations in the system involve more than one of the unknown functions, for 
in this case the coefficient matrix is not diagonal. Let us now consider how we might solve such a system. 

The basic idea for solving a system y' = Ay whose coefficient matrix A is not diagonal is to introduce a new 
unknown vector u that is related to the unknown vector y by an equation of the form y = Pu. in which P is an 
invertible matrix that diagonalizes A. Of course, such a matrix may or may not exist, but if it does then we can rewrite 
the equation y'' = Ay as 

Pn' = A(Pn) 

or alternatively as 

Since P is assumed to diagonalize A, this equation has the form 



where D is diagonal. We can now solve this equation for u using the method of Example 1, and then obtain y by 
matrix multiplication using the relationship y = Pm. 

In summary, we have the following procedure for solving a system y' = Ay in the case were A is diagonalizable. 
r 

A Procedure for Solving y' -AyWA is Diagonalizable 

Step 1. Find a matrix P that diagonalizes A. 

Step 2. Make the substitutions y = Pu and y' = Pn^ to obtain a new "diagonal system" — where 
D = P-^AP- 
Step 3. Solve = 

Step 4. Determine y from the equation y = Pu. 



EXAMPLE 2 Solution Using Diagonalization A 

(a) Solve the system 

y[ = y\ + 72 

(b) Find the solution that satisfies the initial conditions y ^ (0) = 1? y2(0) = 6- 
So/i/fion 

(a) The coefficient matrix for the system is 



1 1 
4 ^2 



As discussed in Section 5.2, A will be diagonalized by any matrix P whose columns are linearly 
independent eigenvectors of^. Since 

A-1 -1 



det(A/-^) = 



-4 X^2 



= A^ + A-6 = (A+3)(A-2) 



the eigenvalues of A are ^ = 2 A = — 3- By definition, 



x = 



^2 



is an eigenvector of A corresponding to A if and only if x is a nontrivial solution of 

"A-l -1 
^-4 A + 2 

If A = 2? this system becomes 









'0' 




/2_ 




0 



1 


-r 






'0" 


-4 


4_ 


/2_ 




_0_ 



Solving this system yields x\=t,X2=t, so 







t 


= t 




^2 




t 


1 



Thus, 



Pl = 



is a basis for the eigenspace corresponding to A = 2- Similarly, you can show that 

"_i 
P2= 4 
1 

is a basis for the eigenspace corresponding to ,\ = — 3- Thus, 

_i 

4 



P = 



1 



diagonahzes A, and 



D = F~^AP 



= [o -3] 



Thus, as noted in Step 2 of the procedure stated above, the substitution 

y = Puand y' = Pu' 

yields the "diagonal system" 



From 2 the solution of this system is 



2 0 

n 



u or 



U2 = -3«2 



«1 =cie 



2;r 



or u = 



«2 = ^2^ 

so the equation y = Ai yields, as the solution for y, 



2x 



-3x 



y = 





> -i' 


c\e 






1 1_ 


C2^ 





or 



1 ^ ^— 3;f 
>'l = ^1^ ^4^2^ 

(b) If we substitute the given initial conditions in 1 1 , we obtain 

^1-^^2=1 
cx +^72 = 6 

Solving this system, we obtain c i = 2, C72 = 4, so it follows from 1 1 that the solution satisfying 
the initial conditions is 

y2 = 2e^' + 4e-^' 



(11) 



Remark Keep in mind that the method of Example 2 works because the coefficient matrix of the system can be 
diagonahzed. In cases where this is not so, other methods are required. These are typically discussed in books 
devoted to differential equations. 



Concept Review 

• Differential equation 

• Order of a differential equation 

• General solution 

• Particular solution 

• Initial condition 

• Initial- value problem 

• First-order linear system 



Skills 

• Find the matrix form of a system of linear differential equations. 

• Find the general solution of a system of linear differential equations by diagonalization. 

• Find the particular solution of a system of linear differential equations satisfying an initial condition. 



Exercise Set 5.4 

^' (a) Solve the system 

y[ = y\ + 

(b) Find the solution that satisfies the initial conditions y j (0) = 0? y2(0) = 0- 
Answer: 

(a) y^ =c\e^^ -2c2e'~^ 

(b) ^1 = 0 
>'2 = 0 

^' (a) Solve the system 



y[ = yi + 3>'2 
y2 = ^yi + 5^2 

(b) Find the solution that satisfies the conditions y j (0) = 2. y2(fi) = 1 • 
'• (a) Solve the system 

y[ = 4yi + 

y2 = -^i + y2 

y'i = -^1 + y2 

(b) Find the solution that satisfies the initial conditions y^ (0) = — 1, ^2(0) = 1> y3(0) = 0- 



Answer: 



(a) 


yi 


= - <:2e^' + C3e^' 




y2 






y3 




(b) 


yi 






yi 






73 





4. Solve the system 

y\ = 4yi + ^2 + 2y2 
72 = ?yi +4^2 + ^3 
= 2>'i + 2;/2 + 473 

5. Show that every solution of y' = ay has the form y = ce'^^- 

[Hint: Let y = f (ar) be a solution of the equation, and show that / (x)«~^ is constant.] 

6. Show that if A is diagonalizable and 

>r 

yi 
y= . 

yy, 

is a solution of the system y ' = Ay, then each y, is a linear combination of g''^!^^ e'^"^, where 

Aj, A2, Ajj are the eigenvalues of^. 

7. Sometimes it is possible to solve a single higher-order linear differential equation with constant coefficients by 
expressing it as a system and applying the methods of this section. For the differential equation y" —y' — Sy = 0 
, show that the substitutions y\=y and y2 =y' l^ad to the system 

y[ = 72 

72 = +72 

Solve this system, and use the result to solve the original differential equation. 



Answer: 



8. Use the procedure in Exercise 7 to solve — 12>' = 0. 

9. Explain how you might use the procedure in Exercise 7 to solve y — 67" + 1 — = 0. Use your 
procedure to solve the equation. 

Answer: 

(a) By rewriting 1 1 in matrix form, show that the solution of the system in Example 2 can be expressed as 



2x 



-3x 



i 
1 



This is called the general solution of the system. 

(b) Note that in part (a), the vector in the first term is an eigenvector corresponding to the eigenvalue = 2, and 
the vector in the second term is an eigenvector corresponding to the eigenvalue A2 = — 3 . This is a special 
case of the following general result: 

r n 



Theorem. If the coefficient matrix A of the system y ' = Ay is diagonalizable, then the general 
solution of the system can be expressed as 



where A^, A2, A„ are the eigenvalues of ^, and Xj is an eigenvector of ^ corresponding to Ay 
Prove this result by tracing through the four-step procedure preceding Example 2 with 







0 .. 


0 




D = 


0 


A2 


0 


an6P= [xi|x2|..Jf„] 




0 


0 .. 







11. Consider the system of differential equations y' = Ay, where ^ is a 2 x 2 matrix. For what values of 

a 11 , c3 12, t22b ^22 do the component solutions y\(l.), (t) tend to zero as i > ,>j? In particular, what must be 
true about the determinant and the trace of A for this to happen? 

12. Solve the nondiagonalizable system 



^1 = y\ 

y2 = y2 



True-False Exercises 



In parts (a)-(e) determine whether the statement is true or false, and justify your answer, 
(a) Every system of differential equations y' = Ay has a solution. 



Answer: 

False 

(b) If x' = -Ac and jr' = Ay, then x = y. 
Answer: 

False 

(c) If x' = -i4x and y' = Ay, then (cx + dy) ' = -4(cx + dy) for all scalars c and J. 
Answer: 

True 

(d) If ^ is a square matrix with distinct real eigenvalues, then it is possible to solve x' = Ax. by diagonalization. 
Answer: 

True 

(e) If A and P are similar matrices, then y' = Ay and a' = have the same solutions. 
Answer: 

False 
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Chapter 5 Supplementary Exercises 

1- (a) Show that if O<0<%, then 



A = 



COS0 — siniff 
smB COS0 



has no eigenvalues and consequently no eigenvectors, 
(b) Give a geometric explanation of the result in part (a). 

Answer: 



(b) The transformation rotates vectors through the angle 3; therefore, if 0 < < fr? then no nonzero vector 
is transformed into a vector in the same or opposite direction. 

2. Find the eigenvalues of 





0 


1 


0 


A = 


0 


0 


1 






-3k^ 


3k 



(a) Show that if £) is a diagonal matrix with nonnegative entries on the main diagonal, then there is a 
matrix S such that = [). 

(b) Show that if ^ is a diagonalizable matrix with nonnegative eigenvalues, then there is a matrix S such 
thatS'^^^. 

(c) Find a matrix S such that = ^, given that 



A = 



1 3 1 
0 4 5 
0 0 9 



Answer: 

(c) 1 1 0 
0 2 1 
0 0 3 

4. Prove: If ^ is a square matrix, then A and ^4 ^ have the same characteristic polynomial. 

5. Prove: If ^ is a square matrix and p (A) = det(A/ — A) is the characteristic polynomial of A, then the 
coefficient of A"~^ inp(X) is the negative of the trace of ^. 



6. Prove: If ^ Q, then 



A = 



a b 
0 a 



is not diagonalizable. 

7. In advanced linear algebra, one proves the Cay ley — Hamilton Theorem, which states that a square matrix 



A satisfies its characteristic equation; that is, if 

CO + ciA + C2A^ + ... + c„-iA""^ + A" = 0 

is the characteristic equation of^, then 

cqI + ciA + C2A^ + ... + + ^" = 0 



Verify this result for 



(a) A = 



3 6 
1 2 



(b) A = 



0 1 0 

0 0 1 

1 -3 3 



In Exercises 8-10, use the Cay ley — Hamilton Theorem, stated in Exercise 7. 

^' (a) Use Exercise 18 of Section 5.1 to prove the Cayley — Hamilton Theorem for 2 x 2 matrices, 
(b) Prove the Cayley — Hamilton Theorem for ^ x » diagonalizable matrices. 

9. The Cayley — Hamilton Theorem provides a method for calculating powers of a matrix. For example, if A 
is a 2 A 2 matrix with characteristic equation 

CQ + ciX + X^ = 0 

^^^^cqI + ciA + A^ = 0, so 

A^= -ciA-cqI 

Multiplying through hy A yields ji^ = —ciA^ — cqA^ which expresses in terms of and J, and 
multiplying through by yields A^ = _ c[A^ — cqA^^ which expresses in terms oi a'^ and A^- 
Continuing in this way, we can calculate successive powers of A by expressing them in terms of lower 
powers. Use this procedure to calculate A^, A^, A^, and A^ for 



A = 



3 6 
1 2 



Answer: 



150" 


,A' = 


"375 


750" 


,A' = 


50 




125 


250 





1875 3750 
625 1250 



10. Use the method of the preceding exercise to calculate ji^ and for 

"0 1 0" 



A = 



0 0 1 

1 -3 3 



11. Find the eigenvalues of the matrix 



A = 



c\ C2 
i i 



\ 



Answer: 



0, tr(^) 

(a) It was shown in Exercise 1 7 of Section 5 . 1 that if ^4 is an « x » matrix, then the coefficient of A" in 
the characteristic polynomial of ^ is 1. (A polynomial with this property is called monic.) Show that 



the matrix 



0 0 0 

1 0 0 
0 1 0 

0 0 0 



0 
0 
0 



-co 
-C2 



1 -Cn-l 



has characteristic polynomial 

p(A) = CO + ciA + ... + c„_iA' 

This shows that every monic polynomial is the characteristic polynomial of some matrix. The matrix 
in this example is called the companion matrix of p (A) . [Hint: Evaluate all determinants in the 
problem by adding a multiple of the second row to the first to introduce a zero at the top of the first 
column, and then expanding by cofactors along the first column.] 

(b) Find a matrix with characteristic polynomial 

13. A square matrix A is called nilpotent if i4" = 0 for some positive integer n. What can you say about the 
eigenvalues of a nilpotent matrix? 

Answer: 

They are all 0. 

14. Prove: If ^ is an ^ x « matrix and n is odd, then A has at least one real eigenvalue. 



15. Find a 3 x 3 matrix A that has eigenvalues A = 0, 1 


, and 


_ 1 with corresponding eigenvectors 




0 




1 




0 






1 


7 


-1 


> 


1 






-1 




1 




1 





respectively. 
Answer: 



1 



0 0 



_1 _i _i 

2 2 

1 -1 -i 

2 2 

16. Suppose that a 4 x 4 matrix A has eigenvalues Aj = 1 , A2 = — 2, A3 = 3, and A4 = — 3. 

(a) Use the method of Exercise 16 of Section 5.1 to find 6et(A). 

(b) Use Exercise 5 above to find tr(j4). 

17. Let J be a square matrix such that = j^. What can you say about the eigenvalues of ^4? 



Answer: 

They are all 0, 1, or — ]. 

(a) Solve the system 

y[ = 71 1 372 
^2 = 2^1+472 

(b) Find the solution satisfying the initial conditions y j (0) = 5 ^i^d y2(0) = 6- 
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I CHAPTER I 

K Inner Product Spaces 



CHAPTER CONTENTS 

6.1. Inner Products 

6.2. Angle and Orthogonality in Inner Product Spaces 

6.3. Gram-Schmidt Process; g7?-Decomposition 

6.4. Best Approximation; Least Squares 

6.5. Least Squares Fitting to Data 

6.6. Function Approximation; Fourier Series 



INTRODUCTION 

In Chapter 3 we defined the dot product of vectors in and we used that concept to 
define notions of length, angle, distance, and orthogonality. In this chapter we will 
generalize those ideas so they are applicable in any vector space, not just We will also 
discuss various applications of these ideas. 
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6.1 Inner Products 



In this section we will use the most important properties of the dot product on as axioms, which, if satisfied by the vectors 
in a vector space V, will enable us to extend the notions of length, distance, angle, and perpendicularity to general vector 
spaces. 



General Inner Products 



In Definition 4 of Section 3.2 we defined the dot product of two vectors inR^, and in Theorem 3.2.2 we listed four 
fundamental properties of such products. Our first goal in this section is to extend the notion of a dot product to general real 
vector spaces by using those four properties as axioms. We make the following definition. 



Note that Definition 1 applies only to real vector 
spaces. A definition of inner products on complex 
vector spaces is given in the exercises. Since we will 
have little need for complex vector spaces from this 
point on, you can assume that all vector spaces under 
discussion are real, even though some of the theorems 
are also valid in complex vector spaces. 

r n 



DEFINITION 1 

An inner product on a real vector space Visa, function that associates a real number (u, vj with each pair of vectors in 
V in such a way that the following axioms are satisfied for all vectors u, v, and w in K and all scalars k. 

1. (u, vj = {v, uj [Symmetry axiom] 

2. (u =h V, wj = (u, wj + (v, wj [Additivity axiom] 

3. (All, vj = k(u, vj [Homogeneity axiom] 

4. (v, vJ > 0 and (v, vJ = 0 if and only if v = 0 [Positivity axiom] 

A real vector space with an inner product is called a real product space. 

L J 



Because the axioms for a real inner product space are based on properties of the dot product, these inner product space axioms 
will be satisfied automatically if we define the inner product of two vectors u and v in /J" to be 

(u, v} = u - V = ui VI + U2V2 + - + u„v„ 

This inner product is commonly called the Euclidean inner product (or the standard inner product) on to distinguish it 
from other possible inner products that might be defined on We call with the Euclidean inner product Euclidean 
n-space. 

Inner products can be used to define notions of norm and distance in a general inner product space just as we did with dot 
products in Recall from Formulas 11 and 19 of Section 3.2 that if u and v are vectors in Euclidean /2-space, then norm and 
distance can be expressed in terms of the dot product as 

||v|| = j/v- V and d(u, v) = ||u — v|| = ^ (u — v) - (u — v) 

Motivated by these formulas we make the following definition. 

r n 



DEFINITION 2 



If K is a real inner product space, then the norm (or length) of a vector v in Kis denoted by ||v|| and is defined by 

l|v|| = /(V, V} 

and the distance between two vectors is denoted by d (u, v) and is defined by 

d{}i, v) = ||u - v|| = j^(u-v,u-v} 

A vector of norm 1 is called a unit vector. 



The following theorem, which we state without proof, shows that norms and distances in real inner product spaces have many 
of the properties that you might expect. 



THEOREM 6.1.1 

If u and V are vectors in a real inner product space F, and if A: is a scalar, then: 

(a) ||v|| > 0 with equality if and only if v = 0- 

(b) ||MI = |*|llv||. 

(c) d (u, Y)=d (v, u) . 

(d) ci(u, v) > 0 with equality if and only if u = v- 



Although the Euclidean inner product is the most important inner product on there are various applications in which it is 
desirable to modify it by weighting each term differently. More precisely, if 

are positive real numbers, which we will call weights, and if u = (2^1, £^2» and v = (vi, V2, v„) are vectors in /J", 

then it can be shown that the formula 



(u, v} = wiwivi f • • • +>i?„£^„v„ (1) 

defines an inner product on that we call the weighted Euclidean inner product with weights m'i,W2»---''^w 

Note that the standard Euclidean inner product is the 
special case of the weighted Euclidean inner product in 
which all the weights are 1 . 



EXAMPLE 1 Weighted Euclidean Inner Product A 

Let u = (^1, 2^2) V = (vi, V2) be vectors inp}. Verify that the weighted Euclidean inner product 

(u, v} = 3wivi + 2u2V2 (2) 

satisfies the four inner product axioms. 



Solution Axiom 1 : Interchanging u and v in Formula 2 does not change the sum on the right side, so 



(u,v} = (v,u}. 



Axiom 2: If w= (vt?i, W2), then 

= 3(uiwi +V1W1) +2(2^2^2+^2^2) 
= (3u\w\ -h 22^2''^2) + (3y\w\ + 2v2>V2) 
= ^u, wj + ^ V, wj 

Axiom 3 : 

{;bi, vj = 3(tei)vi + 2(^2)^2 

= A:(3w IV 1 + 22^2^2) 
= klu, v} 

Axiom 4: Jv, vj = 3(vivi) + 2(v2V2) = 3v^ + 2v2 > 0 with equality if and only if = V2 = 0; that is, if 
and only if v = 0- 

In Example 1, we are using subscripted w's to 
denote the components of thevector w, not the 
weights. The weights are the numbers 3 and 2 in 
Formula 2. 



An Application of Weighted Euclidean Inner Products 

To illustrate one way in which a weighted Euclidean inner product can arise, suppose that some physical experiment has n 
possible numerical outcomes 

xi,X2,---,x„ 

and that a series of m repetitions of the experiment yields these values with various frequencies. Specifically, suppose that 
occurs / 1 times, ^2 occurs / 2 times, and so forth. Since there are a total of m repetitions of the experiment, it follows that 

/l I /2 » • • • I /« = ^ 
Thus, the arithmetic average of the observed numerical values (denoted by x) is 

/l»/2^ •••+/« -^(/l^l+/2:^2+ +/«^«) (3) 

If we let 

f = (/l,/2,-,/«) 

wi = w2 = ... = Wn= \ / m 

then 3 can be expressed as the weighted Euclidean inner product 

= (f,x) = wi/ 17:1 -f>V2/ 2^2+ • ' • +>^w/w^« 



EXAMPLE 2 Using a Weighted Euclidean Inner Product M 



It is important to keep in mind that norm and distance depend on the inner product being used. If the inner product 
is changed, then the norms and distances between vectors also change. For example, for the vectors u = (1, 0) and 
v=(0, l)in/?^ with the Euclidean inner product we have 

Hull = |/l2 + 02 = 1 

and 

^(u,v) = ||u-v|| = ||(l, -l)|| = /l2+(-l)2=/2 

but if we change to the weighted Euclidean inner product 

we have 

M = (u. u}l'2 = [3(1) (1) + 2(0) (0) ] ^ ^ 

and 

^(u,v) = ||u-v|| = ((l, -1), (1, -1)}1'2 
= [3(l)(l) + 2(-l)(-l)]»/2^/5 



\Jn '\t Circles and Spheres in Inner Product Spaces 

If Kis an inner product space, then the set of points in Kthat satisfy 

Hull = 1 

is called the unit sphere or sometimes the unit circle in V. 

EXAMPLES Unusual Unit Circles in < 

(a) Sketch the unit circle in an xy-coordinate system m p} using the Euclidean inner product 
(u, y)=u\v\ -f- W2V2. 

(b) Sketch the unit circle in an xy-coordinate system 'm p} using the weighted Euclidean inner product 
u, vj = ^wivi -f ^W2V2. 

Solution 

(^) If u = {x, 7)5 then ||u|| = (u, uj^^^ = -^y^^ so the equation of the unit circle is \lx^ +7^ = 1? 
squaring both sides, 

As expected, the graph of this equation is a circle of radius 1 centered at the origin (Figure 6.1.1 a). 
If a = {x, 7)5 then ||u|| = (u, uj^^^ = ^ -i^^ + , so the equation of the unit circle is 

J^x^ -\- —y^ = 1 , or, on squaring both sides, 
r 9 4 

— + ^=1 
9 4 

The graph of this equation is the ellipse shown in Figure 6.1.1b. 



(a) Tlie unit circle using 
the standtird Hiiclidean 
inner product. 



7 














K 



(/>) Tlie unit circle using 
a weighted Euclidean 
inner product. 



Figure 6.1.1 



Remark It may seem odd that the "unit circle" in the second part of the last example turned out to have an elliptical shape. 
This will make more sense if you think of circles and spheres in general vector spaces algebraically (||u|| = 1) rather than 
geometrically. The change in geometry occurs because the norm, not being Euclidean, has the effect of distorting the space that 
we are used to seeing through "Euclidean eyes." 



Inner Products Generated by Matrices 

The Euclidean inner product and the weighted Euclidean inner products are special cases of a general class of inner products 
on i?" called matrix inner products. To define this class of inner products, let u and v be vectors in that are expressed in 
column form, and let A be an nvertible n x n matrix. It can be shown (Exercise 31) that if u - v is the Euclidean inner product 
on /i:", then the formula 

Ju, vj = • (4) 

also defines an inner product; it is called the inner product on generated by A. 

Recall from Table 1 of Section 3.2 that if u and v are in column form, then u • v can be written as v'^u from which it follows 
that 4 can be expressed as 

u, vj = {A\) ^Au 

or, equivalently as 



(-)= 



(5) 



EXAMPLE 4 Matrices Generating Weighted Euclidean Inner Products M 

The standard Euclidean and weighted Euclidean inner products are examples of matrix inner products. The 
standard Euclidean inner product on is generated by the ^2 xn identity matrix, since setting ^ = / in Formula 
4 yields 

(u, vj = /u • /v = u • V 

and the weighted Euclidean inner product 

(u, v} = vi?iwivi +W2«2V2+ • ■ • +>^«"«v„ (6) 

is generated by the matrix 




This can be seen by first observing that A is the ^ x « diagonal matrix whose diagonal entries are the weights 
vt?i, vi?2, vt'^ and then observing that 5 simpHfies to 6 when^ is the matrix in Formula 7. 



EXAMPLE 5 Example 1 Revisited M 



Every diagonal matrix with positive diagonal 
entries generates a weighted inner product. 
Why? 

The weighted Euclidean inner product (u, v'j = 3u\v\ + 2u2V2 discussed in Example 1 is the inner product on 
generated by 




Other Examples of Inner Products 

So far, we have only considered examples of inner products onR^. We will now consider examples of inner products on some 
of the other kinds of vector spaces that we discussed earlier. 

EXAMPLE 6 An Inner Product on /Wha? < 

If U and K are ^ X matrices, then the formula 



(8) 



defines an inner product on the vector space (see Definition 8 of Section 1.3 for a definition of trace). This 
can be proved by confirming that the four inner product space axioms are satisfied, but you can visualize why 
this is so by computing 8 for the 2 x 2 matrices 





"VI V2" 


j and V = 


V3 V4 



This yields 



["3 "4 J 



which is just the dot product of the corresponding entries in the two matrices. For example, if 



U = 



1 2 

3 4 



-1 0 

3 2 



then 



( = 1 ( - 1) + 2(0) + 3(3) + 4(2) = 16 

The norm of a matrix U relative to this inner product is 



EXAMPLE 7 The Standard Inner Product on Pn M 

If 

are polynomials in F„, then the following formula defines an inner product on P„ (verify) that we will call the 
standard inner product on this space: 

(p, q}=^0*0-K*3(i6i + • • • (9) 

The norm of a polynomial p relative to this inner product is 

IIpII = l/(P,P} = ^al^a\^ ' ' ' "^ral 



EXAMPLE 8 The Evaluation Inner Product on Pn A 

If 

^ = p{x)=a[]^a\x+ ' ' ' +(3„x" and q = q(x) =bo +bix+ • • • ^b^x^ 

are polynomials in P^^, and ifx{],x\,...,Xy2^T^Q distinct real numbers (called sample points), then the formula 

(P, q)=/^(^0)'?(^0) i /^(^O'^C^l) + ■ • • +Pi^n)^(^n) (10) 

defines an inner product on P„ called the evaluation inner product diix{^, x\, x^^. Algebraically, this can be 
viewed as the dot product in i?" of the ^-tuples 

and hence the first three inner product axioms follow from properties of the dot product. The fourth inner 
product axiom follows from the fact that 



with equality holding if and only if 

But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must be that p = 0, which 
proves that the fourth inner product axiom holds. 

The norm of a polynomial p relative to the evaluation inner product is 

IIpII = /(p^= }l[p(^o)]^+[p(^\)f^ • • • f [p(^n)V (11) 



EXAMPLE 9 Working with the Evaluation Inner Product M 

Let P2 have the evaluation inner product at the points 

X{}= ^2, x\ = 0, andx2 = 2 
Compute (p, qj and ||p|| for the polynomials p = ^(7:) =x and q = q(x) = \ x. 

Solution It follows from 10 and 11 that 

(p, q) = p(- 2)q{ - 2) +^(0)^(0) ^p{2)q{2) = (4)( - 1) + (0)(1) + (4) (3) = 8 

IIpII = hp{^^)?^ [p{^\)]^ I [p{^2)]'^ = hp{-2)]^^[pm'^^[p{2)]^ 

= {a^~^~^^~^= /32 = 4/2 



CALCULUS REQUIRED 

EXAMPLE 10 An Inner Product on C[a, ib] < 

Let f = f (x) and g = g(x) be two functions in C[i^, i] and define 

f(x)gix) dx 



(12) 



We will show that this formula defines an inner product onC[a,b] by verifying the four inner product axioms 
for functions { = f (x), g = g(x), and h = h(x) mC[a,b] : 



1. 



(f, g}=r /«gW dx=t g(x)f(x) dx = \gj 
Ja J a I 



which proves that Axiom 1 holds. 



I /ix)hix)dx+ f g(x)h(x) dx 
Ja Ja 

(f.h}+(g.h) 



which proves that Axiom 2 holds. 



{kf. g)= I kf(x)g(x) dx=kj f{x)g{x) dx^k 
J a J a 



f>g 



which proves that Axiom 3 holds. 
4. If f = / (x) is any function in C[ta(, i] , then 



f\x) dx >0 



(13) 



since f (x) > 0 for all x in the interval [a, b]. Moreover because /is continuous on [a, b], the equality 

holds in Formula 13 if and only if the function /is identically zero on [a,b], that is, if and only if f = 0; and 
this proves that Axiom 4 holds. 



CALCULUS REQUIRED 

EXAMPLE 11 Norm of a Vector in C[a, ib] M 

lfC[a,b] has the inner product that was defined in Example 10, then the norm of a function f = / (;:) relative 
to this inner product is 



f'(x) dx 

and the unit sphere in this space consists of all functions tvi\C[a,b] that satisfy the equation 



(14) 



r 



f\x) dx = \ 



Remark Note that the vector space P„ is a subspace ofC[a, b] because polynomials are continuous functions. Thus, 
Formula 12 defines an inner product on P„. 

Remark Recall from calculus that the arc length of a curve y = f (x) over an interval [a,b] is given by the formula 



L = f^fT\J\^dx (15) 



Do not confuse this concept of arc length with ||f ||, which is the length (norm) of f when f is viewed as a vector 'mC[a,b]. 
Formulas 14 and 15 are quite different. 



Algebraic Properties of Inner Products 

The following theorem lists some of the algebraic properties of inner products that follow from the inner product axioms. This 
result is a generalization of Theorem 3.2.3, which appHed only to the dot product on 



THEOREM 6.1.2 

If u, V, and w are vectors in a real inner product space V, and if ^ is a scalar, then 



(a) (O,v} = (v,0} = 0 

(U, V + w} = (u, v) + (u, w} 

(c) (u, v-w} = (u,v}-(u,w} 

(d) (u-v,w} = (u,w}-(v,w} 

(e) fc(u, V} = (u, /tv} 

□ g 
Proof We will prove part {b) and leave the proofs ofthe remaining parts as exercises. 

(u, v + wj =(v + w, uj [By S3^mmetr7] 

= ^v, uj + (w, uj [By additivity] 
= (u, vj + ^u, wj [By S5rmmetiy] 



The following example illustrates how Theorem 6.1.2 and the defining properties of inner products can be used to perform 
algebraic computations with inner products. As you read through the example, you will find it instructive to justify the steps. 

EXAMPLE 12 Calculating with Inner Products < 

(u - 2v, 3u + 4v} = (u, 3u + 4v) - ( 2v, 3u + 4v} 

= (u, 3u} + (u, 4v} - (2v, 3u) - (2v, 4v} 
= 3(u, u} + 4(u, V} - 6( V, u) - 8( V, v} 

= 3||u||2 + 4(u, V} - 6(u, V) - 8||v||^ 

= 3||u||2-2(u,v)-8||v||2 



Concept Review 

• Inner product axioms 

• Euclidean inner product 

• EucUdean ^-space 

• Weighted Euclidean inner product 

• Unit circle (sphere) 

• Matrix inner product 

• Norm in an inner product space 

• Distance between two vectors in an inner product space 

• Examples of inner products 

• Properties of inner products 

Skills 

• Compute the inner product of two vectors. 

• Find the norm of a vector. 

• Find the distance between two vectors. 



• Show that a given formula defines an inner product. 

• Show that a given formula does not define an inner product by demonstrating that at least one of the inner product 
space axioms fails. 



Exercise Set 6.1 

1. Let (u, V J be the EucUdean inner product on^^, and let u = ( 1 , 1 ) , v = (3, 2) , w = (0, — 1 ) , and t = 3- Compute the 
following. 

(a) (U.V} 

(b) (^.w) 

(c) (u + v.w} 

(d) ll^ll 

(e) ci{n,v) 

(f) ||u-ifcv|| 

Answer: 



(a) 5 

(b) -6 

(c) -3 

(d) fl3 

(e) /5 
(f» \[B9 

2. Repeat Exercise 1 for the weighted Euclidean inner product (u, vj = 2uivi + 3u2V2. 

3. Let (u, vj be the EucHdean inner product on R^, and let u = (3, — 2), v = (4, 5), w= (—1,6), and ^ = _ 4. Verify the 
following. 

(a) (u,v} = (v.u} 

(b) (u + V, w} = (u, w} + (v, w} 

(c) (U, V +w} = (u, v} + (u, w) 

(d) (Aii,v}=/t(u,v} = (u,(fcv} 

(e) (O,v} = (v,0} = 0 

Answer: 



(a) 2 

(b) 11 

(c) -13 

(d) -8 

(e) 0 

4. Repeat Exercise 3 for the weighted Euclidean inner product (u, vj = 42^ivi + 5u2V2- 
* Let (u, vj be the inner product on generated by 
following. 



2 1 
1 1 



, andletu= (2, 1), v= (-1, 1), sv= (0, - 1). Compute the 



(a) (1.V} 

(b) (v,w} 

(c) (ii + v,w} 

(d) llvll 

(e) div.w) 

(f) ||v-w||2 
Answer: 

(a) -5 

(b) 1 

(c) -7 

(d) 1 

(e) 1 

(f) 1 

6. 



Repeat Exercise 5 for the inner product on /{^ generated by j j- 

7. Compute (u, vj using the inner product in Example 6. 



(a) 



Answer: 

(a) 3 

(b) 56 

8. Compute (p, qj. using the inner product in Example 7. 

(a) p = - 2-1 :v I 3x'\q = 4-lx^ 

(b) p= - 5 + 27: + 7:^,q = 3 + 27:-47:^ 

^* (a) Use Formula 4 to show that (u, vJ = 9«ivi + 4a2V2 is the inner product on generated by 

(b) Use the inner product in part (a) to compute (u, v} if a = ( — 3, 2) and v = (1, 7) . 
Answer: 

(b) 29 

(a) Use Formula 4 to show that 

(u, vj = 5«ivi — aiV2 — «2V1 H- 10a2V2 

is the inner product on /j2 generated by 



(b) Use the inner product in part (a) to compute (u, v} if q = (0, — 3) and v = (6, 2) . 

11. Let a = (ai, «2) and v = (vi, V2). In each part, the given expression is an inner product on f^. Find a matrix that 
generates it. 

(a) (u,v) = 3£^ivi I 5u2V2 

(b) (u, v} = 4wivi 62^2^2 



Answer 

(a) 



(b) 



{3 0 

0 {I 

2 0 
0 ^6 



12. Let P2 have the inner product in Example 7. In each part, find ||p||. 

(a) p = - 2 + 3jr + 2jr^ 

(b) p = 4-3jr2 



13. Let 1^22 have the inner product in Example 6. In each part, find ||j4||. 
-2 51 
. 3 6j 
"0 0] 
0 0 



Answer: 



(a) {lA 

(b) 0 

14. Let P2 have the inner product in Example 7. Find of (p, q). 



p = 3-7r H-TT^, q = 2H-57r2 



15. Let M22 have the inner product in Example 6. Find d{A, B). 

■_2 4l „ r-5 1 
6 2 



Answer: 

(a) /ici5 

(b) 1/47 

16. Let P2 have the inner product of Example 9, and let p = 1 + 7: + x and q = 1 — 2x . Compute the following. 

(a) (P, q} 

(b) IIPlI 

(c) ^af(p, q) 

17. Let P3 have the evaluation inner product at the sample points 

7:0 = — 1, XI = 0, 7:2 = 1, 7:3 = 2 



Find (p, q) and ||p|| for p = 7: + t: and q = 1 + x . 
Answer: 

(p. q} = 50, ||p|| = 6/3 

18. In each part, use the given inner product on to find ||w|| , where w = ( — 1 , 3) . 

(a) the Euclidean inner product 

(b) the weighted Euclidean inner product (u, vj = 32^ivi + 2«2V2, where u = (wj, 2^2) v = (vj, V2) 

(c) the inner product generated by the matrix 

19. Use the inner products in Exercise 1 8 to find c3f (u, v) for u = ( — 1 , 2) and v = (2, 5) . 
Answer: 

(a) 3/2 

(b) 3/5 

(c) 3/T3 

20. Suppose that u, v, and w are vectors such that 

(u, vj = 2, ( V, wj = — 3, (u, wj = 5 
||ii|| = l. ||v|| = 2. INI =7 

Evaluate the given expression. 

(a) (u I V, V } w) 

(b) (2v-w, 3u-|- 2w} 

(c) (u-v-2w, 4u-|-v} 

(d) ll^ + v|| 

(e) l|2w-v|| 

(f) ||u-2v + 4w|| 

21. Sketch the unit circle in pj- using the given inner product. 

(a) |u, vj = ^£^ivi -h ^"2V2 

(b) (u, v} = 2«ivi +2^2V2 

Answer: 





22. Find a weighted Euclidean inner product on for which the unit circle is the ellipse shown in the accompanying figure. 





y 

i 








^ ^3 







Figure Ex-22 

23. Let u = (u\,U2) and v = (v i , V2) . Show that the following are inner products on by verifying that the inner product 
axioms hold. 

(a) (u, v} = 32^ivi +5«2V2 

(b) (u,v}=4«ivi +£^2^1 +«1V2+ 42^2^2 



Answer: 

¥oyV = 



0 1 
-1 0 



, then (T, f^J = — 2 < 0, so Axiom 4 fails. 



24. Let u = (2^1, "3) v = (vj, V2, V3) . Determine which of the following are inner products on For those that are 
not, list the axioms that do not hold. 

(a) (u, v)=£^ivi -i-2^3V3 

(b) |u, vj = Vi + 2^2'^2 + ^2^3 

(c) (u, v} = 2«ivi +«2V2+4«3V3 

(d) (u, v} = «ivi -«2V2 + «3V3 

25. Show that the following identity holds for vectors in any inner product space. 

||u + v||2 + ||u-v||2 = 2||u||2 + 2||v||2 

Answer: 

(a) _28 

15 

(b) 0 

26. Show that the following identity holds for vectors in any inner product space. 



ii,Tl = l||ii + T||2-l||n-v||2 



1 "2 
«3 U4 



andr = 



VI V2 
V3 V4 



28. Calculus required Let the vector space P2 have the inner product 

p{x)q{x) dx 



. Show that (U,V*^=u\v\^ U2V3 + 2^3V2 + U4V4 is not an inner product on ^22- 
; the inner pre 



(a) Find||p|| forp= l,p = :^, andp=x^ 

(b) Find d (p, q) if p = 1 and q = 

29. Calculus required Use the inner product 



q 



-L 



p{x)q{x) dx 



on P3, to compute (p, qj. 

(a) ^=\-x^x'^^ 5x^, fi = x-3x^ 

(b) p = X - 5;r^, q = 2 -h 8;r^ 

30. Calculus required In each part, use the inner product 



fix)g{x)dx 



on C[0, 1 ] to compute {f , g). 

(a) f = cos2i:x, g = s\n2i:x 

(b) f =7:, g = e'^ 

(c) f =tan|x,g=l 

31. Prove that Formula 4 defines an inner product on i?". 

32. The definition of a complex vector space was given in the first margin note in Section 4. 1 . The definition of a complex 
inner product on a complex vector space V is identical to Definition 1 except that scalars are allowed to be complex 
numbers, and Axiom 1 is replaced by vj = (v, uj. The remaining axioms are unchanged. A complex vector space with 
a complex inner product is called a complex inner product space. Prove that if K is a complex inner product space then 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The dot product on is an example of a weighted inner product. 
Answer: 

True 

(b) The inner product of two vectors cannot be a negative real number. 
Answer: 

False 

(c) (u, V -f wj = (v, uj -I- (w, u}. 
Answer: 

True 

(d) jAu,Avj = A:^ju,vj. 
Answer: 



True 



(e) If (u, v} = 0, then q = 0 or v = 0- 
Answer: 

False 

(f) If||v||2 = 0,thenv = 0. 
Answer: 

True 

(g) If ^ is an « X » niatrix, then (u, vj = Aa • Av defines an inner product on 
Answer: 

False 
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6.2 Angle and Orthogonality in Inner Product 
Spaces 

In Section 3.2 we defined the notion of "angle" between vector in R^. In this section we will extend this idea 
to general vector spaces. This will enable us to extend the notion of orthogonality as well, thereby setting the 
groundwork for a variety of new applications. 

Cauchy-Schwarz Inequality 

Recall from Formula 20 of Section 3.2 that the angle (f between two vectors u and v in is 

We were assured that this formula was valid because it followed from the Cauchy-Schwarz inequality 
(Theorem 3.2.4) that 



-1< "••^ <1 
- Ilullllvll 



(2) 



as required for the inverse cosine to be defined. The following generalization of Theorem 3.2.4 will enable us 
to define the angle between two vectors in any real inner product space. 

THEOREM 6.2.1 Cauchy-Schwarz Inequality 

If u and V are vectors in a real inner product space V, then 

|(u.v}|< Ilullllvll (3) 



Proof We warn you in advance that the proof presented here depends on a clever trick that is not easy to 
motivate. 

In the case where u = 0 the two sides of 3 are equal since (u, vj and ||u|| are both zero. Thus, we need only 
consider the case where u ^ 0- Making this assumption, let 

a = (u, uj, b = 2(u, vj, c = (v, vj 

and let t be any real number. Since the positivity axiom states that the inner product of any vector with itself is 
nonnegative, it follows that 

0<(^u + v, ^u + v} = ju, u|^^ + 2ju, vj^-f jv, vj 

= at'^ + bt + c 



This inequality implies that the quadratic polynomial at} ^ht-V c has either no real roots or a repeated real 
root. Therefore, its discriminant must satisfy the inequality — ^ac < 0- Expressing the coefficients a, 

and c in terms of the vectors u and v gives 4(u, vj^ — 4|u, u||v, v| < 0 or, equivalently, 

(u, v}-^ < ju, u|jv, v| 

Taking square roots of both sides and using the fact that (u, uj and (v, vj are nonnegative yields 
|u, vj < (u, uj^^^^v, vj^'" crequivalently '^j 

which completes the proof 



<llu||||v|| 



The following two alternative forms of the Cauchy-Schwarz inequality are usefiil to know: 

(u, v}^ < ju, ujjv, vJ 



(4) 



(u,v)2<||u||2||v||2 



(5) 



The first of these formulas was obtained in the proof of Theorem 6.2.1, and the second is a variation of the 
first. 



Angle Between Vectors 



Our next goal is to define what is meant by the "angle" between vectors in a real inner product space. As the 
first step, we leave it for you to use the Cauchy-Schwarz inequality to show that 



.mm 

This being the case, there is a unique angle 0 in radian measure for which 

cosfl = and 0<0<it 

M\M - - 

(Figure 6.2. 1). This enables us to define the angle 0 between u and v to be 



(6) 



(7) 



(8) 
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Figure 6.2.1 



EXAMPLE 1 Cosine of an Angle Between Two Vectors in 

Let have the Euclidean inner product. Find the cosine of the angle (/ between the vectors 
u=(4,3,l, -2)andv=(-2. 1,2,3). 

Solution We leave it for you to verify that 

NI = /30. I|v|| = /i8. and ju.v)=-9 

from which it follows that 

llullllvll - 2/1? 



Properties of Length and Distance in General Inner Product Spaces 

In Section 3.2 we used the dot product to extend the notions of length and distance to and we showed that 
various familiar theorems remained valid (see Theorem 3.2.5, Theorem 3.2.6, and Theorem 3.2.7). By making 
only minor adjustments to the proofs of those theorems, we can show that they remain valid in any real inner 
product space. For example, here is the generalization of Theorem 3.2.5 (the triangle inequalities). 

THEOREM 6.2.2 

If u, V, and w are vectors in a real inner product space V, and if k is any scalar, then: 

(a) ||u + v|| < ||u|| + ||v|| [Triangle inequality for vectors] 

(b) d (u, v) < (i(u, w) + (i(w, v) [Triangle inequality for distances] 



Proof (a) 



||u + v||^ = (u + v, u + v} 

= (u, u} + 2(u, v} + (v, v} 

<{u, uj+|{u, vj| + {v, vj [Property of absolute value] 
<(u, u} + 2||u||||v|| + (v, V} [By (3)] 

= ||u||2 + 2||u||||v|| + ||v||2 
= (IN + ||v||)2 



Taking square roots gives ||u + v|| < ||u|| + ||v||. 



Proof (b) Identical to the proof of part (b) of Theorem 3.2.5. 



Orthogonality 

Although Example 1 is a useful mathematical exercise, there is only an occasional need to compute angles in 
vector spaces other than and A problem of more interest in general vector spaces is ascertaining 
whether the angle between vectors is jr f 2- You should be able to see from Formula 8 that if u and v are 
nonzero vectors, then the angle between them is i) — - / 2 if and only if |u, vj = U. Accordingly, we make the 
following definition (which is applicable even if one or both of the vectors is zero). 

n 

DEFINITION 1 

Two vectors u and v in an inner product space are called orthogonal if (u, vj = 0. 
L J 

As the following example shows, orthogonality depends on the inner product in the sense that for different 
inner products two vectors can be orthogonal with respect to one but not the other. 

EXAMPLE 2 Orthogonality Depends on the Inner Product A 

The vectors u = (1, 1) and v = (1, — 1) are orthogonal with respect to the Euclidean inner 
product onp}, since 

u.v=(l)(l) I (1)(-1) = 0 

However, they are not orthogonal with respect to the weighted Euclidean inner product 
(u, v} = 3wivi + 2u2V2. since 

(u,v} = 3(l)(l) + 2(l)(»l) = 1^0 



EXAMPLE 3 Orthogonal Vectors in M22 ^ 

If M22 has the inner product of Example 6 in the preceding section, then the matrices 



'\ 0" 


and V = 


"0 


2 


1 1_ 


0 


0_ 



are orthogonal, since 

[U, V) = 1(0) +0(2) + 1(0) + 1(0) = 0 



CALCULUS REQUIRED 



EXAMPLE 4 Orthogonal Vectors in P2 M 

Let P2 have the inner product 

p. q =y ^ p{^)q{x) dx 



and let p = ^ and q = x . Then 



=(P,P)'«= 





1/2 




XX dx 




^ x^dx 



L 

/I . . 

i 2 

(p. ^) = j j ^^^^-^ 



= (q.q)'" = 



1/^ 









1/2 



1/2 



Because (p, q} = 0, the vectors p = ^ and q = x are orthogonal relative to the given inner 
product. 



In Section 3.3 we proved the Theorem of Pythagoras for vectors in Euclidean /7-space. The following theorem 
extends this result to vectors in any real inner product space. 



THEOREM 6.2.3 Generalized Theorem of Pythagoras 

If u and V are orthogonal vectors in an inner product space, then 

||u + v||2 = ||u||2 + ||v||2 



Proof The orthogonaUty of u and v impHes that (u, vj = 0, so 



||u + v||2 = (u + v, u + v}=||u||2 + 2(u, v}+||v||2 
= llu||2 + ||v||2 



CALCULUS REQUIRED 

EXAMPLE 5 Theorem of Pythagoras in P2 A 



9 

In Example 4 we showed that p = :^ and q = 7: are orthogonal with respect to the inner product 



J -I 

on P2. It follows from Theorem 6.2.3 that 

llp + qll^ = llplP + llq||^ 

Thus, from the computations in Example 4, we have 

iiP.,ip=(/D%(/D^f^=]f 

We can check this result by direct integration: 

llp + q||^ = (p + q,p + q}=y^^ [x + x^^lx + x^^dx 

= j^^ x^dx + 2j^^ x^dx+ ^ ;^'^^;^ = | + 0 + | = j| 



Orthogonal Complements 

In Section 4.8 we defined the notion of an orthogonal complement for subspaces of i^", and we used that 
definition to establish a geometric link between the fundamental spaces of a matrix. The following definition 
extends that idea to general inner product spaces. 

r n 



DEFINITION 2 

If is a subspace of an inner product space V, then the set of all vectors in Fthat are orthogonal to 
every vector in W is called the orthogonal complement of W and is denoted by the symbol W ^ • 



In Theorem 4.8.8 we stated three properties of orthogonal complements in The following theorem 
generalizes parts (a) and (b) of that theorem to general inner product spaces. 



THEOREM 6.2.4 

If is a subspace of an inner product space V, then: 

(a) J^^"^ is a subspace of V. 

(b) WnW^= {0}. 



Proof (a) The set W ' contains at least the zero vector, since (0, wj = 0 for every vector w in W. Thus, it 
remains to show that W ' is closed under addition and scalar multiplication. To do this, suppose that u and v 
are vectors in W ' , so that for every vector w in Wwq have {u, wj = 0 and {v, wj = 0. It follows from the 
additivity and homogeneity axioms of inner products that 

(u + V, w} = ( u, w} + ( V, w} = 0 + 0 = 0 
(ybi, w} = klu, w} = k(0) = 0 
which proves that u | v and ^ are in 

Proof (b) If V is any vector in both Wand W ' , then v is orthogonal to itself; that is, (v, vj = 0. It follows 
from the positivity axiom for inner products that v = 0- 

The next theorem, which we state without proof, generalizes part (c) of Theorem 4.8.8. Note, however, that 
this theorem applies only to finite-dimensional inner product spaces, whereas Theorem 6.2.5 does not have 
this restriction. 



THEOREM 6.2.5 

Theorem 6.2.5 implies that in a finite- 
dimensional inner product space 
orthogonal complements occur in pairs, 
each being orthogonal to the other (Figure 
6.2.2). 

Theorem 6.2.5 If is a subspace of a finite-dimensional inner product space V, then the orthogonal 
complement oiW ^ is W\ that is. 




[gure 6.2.2 Each vector in W is orthogonal to each vector in W and conversely 



In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the row space and null space 
of a matrix are orthogonal complements with respect to the Euclidean inner product on (Theorem 4.8.9). 
The following example takes advantage of that fact. 

EXAMPLE 6 Basis for an Orthogonal Complement M 

Let WhQ the subspace of R^ spanned by the vectors 

wi =(1,3, -2,0.2.0). W2 = (2.6, -5. -2.4. -3). 
W3 = (0, 0. 5. 10. 0, 15), W4= (2, 6, 0, 8, 4, 18) 

Find a basis for tlie ortliogonal complement of W. 

Solution The space W is the same as the row space of the matrix 

"1 3 -2 0 2 O" 

2 6 -5 -2 4 -3 

0 0 5 10 0 15 

2 6 0 8 4 18 

Since the row space and null space of A are orthogonal complements, our problem reduces to 
finding a basis for the null space of this matrix. In Example 4 of Section 4.7 we showed that 



-3' 




-4' 




-2' 


1 




0 




0 


0 




-2 




0 


0 


, V2 = 


1 


, V3 = 


0 


0 




0 




1 


0 




0 




0 



form a basis for this null space. Expressing these vectors in comma-delimited form (to match 
that of wi, W2, W3, and W4), we obtain the basis vectors 

VI = (-3. 1.0. 0.0.0). V2=(-4,0, -2, 1,0,0). V3 = (-2.0,0,0,l,0) 

You may want to check that these vectors are orthogonal to ^1,^2,^3, and W4 by computing 
the necessary dot products. 



Concept Review 

• Cauchy-Schwarz inequality 

• Angle between vectors 

• Orthogonal vectors 

• Orthogonal complement 

Skills 



Find the angle between two vectors in an inner product space. 

Determine whether two vectors in an inner product space are orthogonal. 

Find a basis for the orthogonal complement of a subspace of an inner product space. 



Exercise Set 6.2 

1. Let p}, p}, and p^^ have the Euclidean inner product. In each part, find the cosine of the angle between u 
and V. 

(a) u=(l. -3). v=(2.4) 

(b) u=(-l,0). v=(3.8) 

(c) u=(-1.5,2). v=(2.4, -9) 

(d) u=(4,l,8), v=(1.0, -3) 

(e) u= (1.0. 1.0). v=(-3. -3. -3. -3) 

(f) a=(2.1.7. -1), v= (4. 0.0.0) 

Answer: 



(a) _ 1 
3 



(b) 



1/73 

(c) 0 

(d) 20 



9/T0 

(e) 

1/2 
2 



(f) 



{¥5 



2. Let P2 have the inner product in Example 7 of Section 6.1 . Find the cosine of the angle between pand q. 

(a) p = _ 1 + 5x + 2x^, q = 2 + 4x - 

(b) p = x-x'^,q = 7 + 3x + 3x^ 

3. Let M22 have the inner product in Example 6 of Section 6. 1 . Find the cosine of the angle between A and 
B. 



Answer: 



10/7 

(b) 0 

4. In each part, determine whether the given vectors are orthogonal withrespect to the Euclidean inner 
product. 

(a) a=(-1.3.2). v=(4.2. -1) 

(b) a=(-2. -2. -2). T=(l.l. 1) 

(c) tt=(ttl,tt2,tt3). v= (0,0,0) 

(d) u=(-4.6. -10,1), v=(2, 1, -2,9) 

(e) u=(0,3, -2,1), v=(5,2, -1,0) 

(f) n=(a,b), Y=(-b,a) 

5. Show that p = 1 — 7: + 2x and q = 2^: H- are orthogonal with respect to the inner product in Exercise 
2. 

6. Let 



Which of the following matrices are orthogonal to A with respect to the inner product in Exercise 3? 




7. Do there exist scalars k and / such that the vectors q = (2, jfc, 6), v = (/, 5, 3), and Hr= (1, 2, 3) are 

mutually orthogonal with respect to the Euclidean inner product? 

Answer: 

No 

8. Let have the Euclidean inner product, and suppose that a = (1, 1, — 1) and v = (6, 7, — 15). Find a 
value of k for which \\hl + v|| = 13. 

9. Let /j3 have the Euclidean inner product. For which values of k are u and v orthogonal? 

(a) u=(2, 1,3), v=(l,7,^) 

(b) u=(i,i.l). v=(i.5.6) 



Answer: 



(a) k=-3 

(b) k= -2, -3 

10. Let p^"^ have the Euclidean inner product. Find two unit vectors that are orthogonal to all three of the 
vectors u= (2, 1, «4, 0), v = (- 1, - 1, 2, 2),andw= (3, 2, 5,4). 

11. In each part, verify that the Cauchy-Schwarz inequality holds for the given vectors using the Euclidean 
inner product. 

(a) a=(3.2). v=(4. -1) 

(b) a=(-3, 1.0), v=(2, -1,3) 

(c) a=(-4,2, 1), v=(8, -4, -2) 

(d) a=(0, -2,2,1), v=(-l, -1,1,1) 

12. In each part, verify that the Cauchy-Schwarz inequality holds for the given vectors, 
(a) u = ( — 2, 1 ) and v = ( 1 , 0) using the inner product of Example 1 of Section 6.1. 



(b) u = 



-1 2 

6 1 



-[3 3] 



and = « « using the inner product in Example 6 of Section 6. 1 



(c) p = — 1 4= 2x + 7:^ and q = 2 — 4x using the inner product given in Example 7 of Section 6. 1 . 

13. Let have the Euclidean inner product, and let q = ( — 1, 1, 0, 2). Determine whether the vector u is 
orthogonal to the subspace spanned by the vectors hti = (0, 0, 0, 0), W2 = (1, — 1, 3, 0), and 
W3=(4,0.9.2). 

Answer: 

No 

In Exercises 14-15, assume that has the Euclidean inner product. 

14. Let ^be the line in /J^ ^Jth equation y =z2X' Find an equation for W^- 

(a) Let ^be the plane in R^ with equation x ~ 2y ^ 3z = 0- Find parametric equations for W^- 

(b) Let Whe the line in with parametric equations 

x = 2t, y=^5t, z = At 

Find an equation for W^- 

(c) Let WhQ the intersection of the two planes 

;:+j)/+z = 0 and x — y + z = 0 

in /j3 Find an equation for W^- 
Answer: 



(a) x=t, y — — 2t, z— ^3t 



(b) 2x-5y + 4z = 0 

(c) x-z=0 

16. Find a basis for the orthogonal complement of the subspace of spanned by the vectors. 

(a) vi = (l. -1.3),V2 = (5, -4. -4),V3 = (7, -6.2) 

(b) VI = (2,0, -1),V2 = (4,0, -2) 

(c) VI = (1, 4, 5, 2), V2 = (2, 1, 3, 0), V3 = ( - 1, 3, 2, 2) 

(d) vi = (1.4.5.6, 9),V2=(3, -2,1,4, - 1), V3 = ( - 1, 0, -1, -2, - 1), V4= (2. 3, 5, 7, 8) 

17. Let Fbe an inner product space. Show that if u and v are orthogonal unit vectors in V, then ||u — v|| = j/^ 

18. Let Fbe an inner product space. Show that if w is orthogonal to both and U2, then it is orthogonal to 
fciui + A:2U2 for all scalars ki and /fc2- Interpret this result geometrically in the case where Fis /J-^ with 
the Euclidean inner product. 

19. Let Fbe an inner product space. Show that if w is orthogonal to each of the vectors uj, U2, Uy, then it 
is orthogonal to every vector in span {ui, U2, - u^} • 

20. Let {vi, V2, Vy} be a basis for an inner product space V, Show that the zero vector is the only vector 
in Fthat is orthogonal to all of the basis vectors. 

21. Let {wi, W2 Wfc) be a basis for a subspace Wof V. Show that consists of all vectors in Fthat are 

orthogonal to every basis vector. 

22. Prove the following generalization of Theorem 6.2.3: If vi, V2 are pairwise orthogonal vectors in 

an inner product space V, then 

IIV1+V2+ • • • +V,||2 = ||vi||2+||v2||2+ . . . +||V,||2 

23. Prove: If u and v are « x 1 matrices and v4 is an ^ x « raatrix, then 

.2 



24. Use the Cauchy-Schwarz inequality to prove that for all real values of a, b, and ff, 

{acosO I bs\i-)ljy -^^^a^ -\~b^ 

25. Prove: If w \ , m?2, . >i- vi are positive real numbers, and if a = U2 «m) v = (vi, V2 v„) 

are any two vectors in then 

( 2 2 2\^^^ / 2 2 2\^^^ 

wi«i +W2«2 + ■ ■ ■ +Wm«m) (wivi +W2V2 + " ' ' +WmV„ J 

26. Show that equality holds in the Cauchy-Schwarz inequality if and only if u and v are linearly dependent. 

27. Use vector methods to prove that a triangle that is inscribed in a circle so that it has a diameter for a side 
must be a right triangle. [Hint: Express the vectors and £Q in the accompanying figure in terms of u 
andv.] 




Figure Ex-27 

28. As illustrated in the accompanying figure, the vectors u = i| 1 , 3 |i and v = [ — 1 , ^3 J have norm 2 and 

an angle of 60° between them relative to the Euclidean inner product. Find a weighted Euclidean inner 
product with respect to which u and v are orthogonal unit vectors. 




Figure Ex-28 

29. Calculus required LqI f (x) and g(x) be continuous functions on [0, 1 ] . Prove: 
(a) 



(b) 



f. 



-|2 




< 


j^f\^)dx 



f. 



g\x) dx 



)^gix)Ydx 



-|l/2 




1/2 


< 


j^f\^)dx 


+ 



g'(x) dx 



1/2 



[Hint: Use the Cauchy-Schwarz inequality.] 
30. Calculus required Let C[0, k] have the inner product 



and let f „ = cos^t: (« = 0, 1, 2, ...). Show that if ,t /, then f ^ and f j are orthogonal vectors. 

(a) Let WhQ the line y = x m an xy-coordinate system mp}. Describe the subspace W'^- 

(b) Let WhQ the j;-axis in an xyz-coordinate system mp}. Describe the subspace W^- 

(c) Let WhQ thej^z-plane of an xyz-coordinate system in p^^. Describe the subspace W^- 



Answer: 

(a) The line y = —x 

(b) Thexz-plane 

(c) The X-axis 

32. Prove that Formula 4 holds for all nonzero vectors u and v in an inner product space V. 



True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) If u is orthogonal to every vector of a subspace W, then q = 0- 
Answer: 

False 

(b) If u is a vector in both /^and W ' , then u = 0- 
Answer: 

True 

(c) If u and v are vectors inW ^ , then ^^visinW^- 
Answer: 

True 

(d) If u is a vector in W * and ^ is a real number, then ^ is in 
Answer: 

True 

(e) If u and v are orthogonal, then |(u, v}| = ||u|| ||v||. 
Answer: 

False 

(f) If u and V are orthogonal, then ||u + v|| = ||u|| + ||v||. 
Answer: 

False 
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6.3 Gram-Schmidt Process; Qf?-Decomposition 



In many problems involving vector spaces, the problem solver is free to choose any basis for the vector space that 
seems appropriate. In inner product spaces, the solution of a problem is often greatly simplified by choosing a basis 
in which the vectors are orthogonal to one another. In this section we will show how such bases can be obtained. 



Orthogonal and Orthonormal Sets 



Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal if their inner product is 
zero. The following definition extends the notion of orthogonality to sets of vectors in an inner product space. 

n 

DEFINITION 1 

A set of two or more vectors in a real inner product space is said to be orthogonal if all pairs of distinct 
vectors in the set are orthogonal. An orthogonal set in which each vector has norm 1 is said to be 
orthogonal. 



EXAMPLE 1 An Orthogonal Set in < 

Let 

ui = (0,1,0), U2= (1,0,1), U3 = (1,0,-1) 
and assume that ^-^ has the Euclidean inner product. It follows that the set of vectors 
S= {ui, U2, U3} is orthogonal since (ui, U2} = (ui, U3J = {U2, U3J = 0. 



If V is a nonzero vector in an inner product space, then it follows from Theorem 6.1.1Z? with k = ||v|| that 



1 



-V = 



1 



llvll 



V = ■ 



1 



llvll = 1 



from which we see that multiplying a nonzero vector by the reciprocal of its norm produces a vector of norm 1 . This 
process is called normalizing v. It follows that any orthogonal set of nonzero vectors can be converted to an 
orthonormal set by normalizing each of its vectors. 

EXAMPLE 2 Constructing an Orthonormal Set A 



The Euclidean norms of the vectors in Example 1 are 

l|uill = l. l|u2ll = /2. l|ii3ll = /2 

Consequently, normalizing ^1,^2? and 113 yields 



.■ = |S^ = (0,.,0). v, = ^=|-L,o,-Lj, 

We leave it for you to verify that the set = { vi , V2, V3 } is orthonormal by showing that 
(VI, V2) = (vi, V3} = (V2,V3} = 0 and ||vi|| = ||v2|| = ||v3|| = 1 



In any two nonzero perpendicular vectors are linearly independent because neither is a scalar multiple of the 
other; and mp} any three nonzero mutually perpendicular vectors are linearly independent because no one lies in 
the plane of the other two (and hence is not expressible as a linear combination of the other two). The following 
theorem generalizes these observations. 



THEOREM 6.3.1 

If S'= {y\, V2, v„) is an orthogonal set of nonzero vectors in an inner product space, then S is linearly 
independent. 



Proof Assume that 



fclVi+jt2V2+ • • ■ +^„v„ = 0 (1) 

To demonstrate that S = {vi, V2, v„} is linearly independent, we must prove that k\ = k2 = ... = kyi = 0. 

For each Vj in S, it follows from 1 that 

(ilvi+A:2V2+ ■ • • +jtMV„, Vj} = (0, Vi) = 0 

or, equivalently, 

^l(vi, Vj }-h^2(v2, Vj}+ • • • +A:„(v„, v, } = 0 

From the orthogonality of S it follows that (vy, Vj J = 0 when j ^i,so this equation reduces to 

^3(Vj, v,} = 0 

Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom for inner products that 
(Vj, Vj J 9t 0. Thus, the preceding equation implies that each ki in Equation 1 is zero, which is what we wanted to 
prove. 

Since an orthonormal set is orthogonal, and since 
its vectors are nonzero (norm 1), it follows from 
Theorem 6.3.1 that every orthonormal set is 
linearly independent. 



In an inner product space, a basis consisting of orthonormal vectors is called an orthonormal basis, and a basis 



consisting of orthogonal vectors is called an orthogonal basis. A familiar example of an orthonormal basis is the 
standard basis for with the Euclidean inner product: 

ei = (1, 0, 0, 0), 62 = (0, 1, 0, 0), e„ = (0, 0, 0, 1) 



EXAMPLE 3 An Orthonormal Basis M 

In Example 2 we showed that the vectors 



v, = (0,:,0). -2=[^.0.^]. ana V3=(^.0.-^j 

form an orthonormal set with respect to the Euclidean inner product on /J ^. By Theorem 6.3.1, these 
vectors form a linearlyindependent set, and since R^ is three-dimensional, it follows from Theorem 
4.5.4 that ^7= {vi,V2, V3} is an orthonormal basis for 



Coordinates Relative to Orthonormal Bases 

One way to express a vector u as a linear combination of basis vectors 

S= (vi, V2,..., v„} 

is to convert the vector equation 

u = civi +i:2V2+ • • ■ +<:mV„ 

to a linear system and solve for the coefficients c 1 , C2, However, if the basis happens to be orthogonal or 

orthonormal, then the following theorem shows that the coefficients can be obtained more simply by computing 
appropriate inner products. 



THEOREM 6.3.2 

(a) lfS= {vi, V2, v„} is an orthogonal basis for an inner product space V, and if u is any vector in V, 
then 

llvill^ ||V2||2 ||v„||2 

(b) lfS= ( vi , V2, - - v„ } is an orthonormal basis for an inner product space V, and if u is any vector in V, 
then 

a = (u, VI }vi + (u, V2}V2 + • • • + (u, v„}v„ (3) 



Proof (a) Since S= { vi , V2, . . v„ ) is a basis for y, every vector u in F can be expressed in the form 

u = civi -ht:2V2+ • • • +c„v„ 



'''=-r^ (4) 



We will complete the proof by showing that 

l|v,ll 

for i = 1, 2, To do this, observe first that 

(u,Vj} =(civi+C2V2+ • • • +c„v„,Vi} 

= ^^l(vi, Vj }-hC2(v2, v, J+ • • • +c:„(v„,Vj} 

Since S is an orthogonal set, all of the inner products in the last equality are zero except the /th, so we have 

2 



(U, Vj|=Cy|Vj, Vj|=Cj||Vj,| 

Solving this equation for yields 4, which completes the proof 

Proof (b) In this case, ||vi || = ||v2|| = ... = ||vj^|| = 1 , so Formula 2 simplifies to Formula 3. 

Using the terminology and notation from Definition 2 of Section 4.4, it follows from Theorem 6.3.2 that the 
coordinate vector of a vector u in F relative to an orthogonal basis S = {vj, V2, v„} is 

\ I|V1||2 ' ||V2||2 ' ' ||V„||2 I 

and relative to an orthonormal basis S= { vj , V2, . . v„ ) is 

(u)^= ((u, VI }, (u, V2}, (u, v„}) (6) 

EXAMPLE 4 A Coordinate Vector Relative to an Orthonormal Basis ^ 

Let 

VI = (0,1,0), v2=(-^,0,|j, V3=(|,0,i) 

It is easy to check that S = {vj, V2, V3} is an orthonormal basis for with the Euclidean inner product. 
Express the vector u=(l, l,l)asa linear combination of the vectors in S, and find the coordinate vector 

Solution We leave it for you to verify that 

u, VI 1=1, |u, V2|=-^, and |u, V3| = ^ 



Therefore, by Theorem 6.3.2 we have 



u = vi- jV2 4yV3 



that is, 

(1, 1. 1) = (0, 1, 0) - 1(- 1, 0, 1) + 0. 

Thus, the coordinate vector of u relative to S is 

Cu)^= ((u. VI }. (u, V2}, (u, V3}) = (1. - i I) 



EXAMPLE 5 An Orthonormal Basis from an Orthogonal Basis M 



(a) Show that the vectors 



svi = (0, 2, 0), W2 = (3, 0, 3), W3 = ( - 4, 0, 4) 



form an orthogonal basis for /^-^ with the Euclidean inner product, and use that basis to find an 
orthonormal basis by normalizing each vector. 

(b) Express the vector a = (1, 2, 4) as a linear combination of the orthonormal basis vectors obtained 
in part (a). 

Solution 

(a) The given vectors form an orthogonal set since 



It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form a basis 
for by Theorem 4.5.4. We leave it for you to calculate the norms of wi, W2, and W3 and then 
obtain the orthonormal basis 





(b) It follows from Formula 3 that 



u = (u, VI }vi + (u, V2}V2 + (u, V3}V3 



We leave it for you to confirm that 



(u,vi} =(l,2,4)-(0, 1,0) =2 





and hence that 




/2(/2'^' ^2)^ /2( )f2'^' ^2) 



Orthogonal Projections 



Many applied problems are best solved by working with orthogonal or orthonormal basis vectors. Such bases are 
typically found by starting with some simple basis (say a standard basis) and then converting that basis into an 



orthogonal or orthonormal basis. To explain exactly how that is done will require some preliminary ideas about 
orthogonal projections. 

In Section 3.3 we proved a result called the Prohection Theorem (see Theorem 3.3.2) which dealt with the problem 
of decomposing a vector u in into a sum of two terms, ^\ and W2, in which is the orthogonal projection of u 
on some nonzero vector a and W2 is orthogonal to wi (Figure 3.3.2). That result is a special case of the following 
more general theorem. 



THEOREM 6.3.3 Projection Theorem 

If ^ is a finite-dimensional subspace of an inner product space F,then every vector u in Fcan be expressed 
in exactly oneway as 



u=wi +W2 



(7) 



where ^\ is in ^and W2 is in Jl^-'-. 



The vectors and W2 in Formula 7 are commonly denoted by 

wi = projj^ u and W2 = proj^^ x u 



(8) 



They are called the orthogonal projection of u on Wand the orthogonal projection ofnonW^ respectively. The 
vector W2 is also called the component of u orthogonal to W. Using the notation in 8, Formula 7 can be expressed 
as 



(Figure 6.3.1). Moreover, since proj^xu = u — projj^u, we can also express Formula 9 as 

1 = V^oiW n + (u - projj^ u) 



(9) 



(10) 




Figure 6.3.1 



The following theorem provides formulas for calculating orthogonal projections. 

□ 



THEOREM 6.3.4 



Let Whea finite-dimensional subspace of an inner product space V. 

(a) If {vj, V2 V;.} is an orthogonal basis for W, and u is any vector in V, then 

proj^u = -i ^vi + -> ^V2 + • • • + i-Vr (11) 

llvif ||v2f ||v,||2 

(b) If (vi, V2, Vy} is an orthonormal basis for W, and u is any vector in V, then 

projjjru = (u, VI }vi + (u, V2}V2 + • • • + (u, Vy}Vy (12) 



Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the form u = 4 W2, where 

w\ = proj^u is in ^and W2 is in IV ' ; and it follows from Theorem 6.3.2 that the component projj^u = wi can be 

expressed in terms of the basis vectors for WsiS 

llvill^ ||V2||2 ||v,||2 

Since W2 is orthogonal to W, it follows that 

(W2, VI } = (W2, V2} = ...= (W2, Vr) = 0 

so we can rewrite 13 as 

(W1+W2, vi) (W1IW2, V2) . . (W1+W2, 

llvill^ ||V2||2 ||v,||2 

or, equivalently, as 

projHTU = wi = ^vi + -> ^V2 + • • • + -! -i-v^ 

l|vi||2 ||V2||2 ||v,||2 

Proof (a) In this case, ||vi || = ||V2|| = ... = \\Vr\\ = 1 > so Formula 13 simplifies to Formula 12. 

EXAMPLE 6 Calculating Projections M 

Let have the Euclidean inner product, and let Wbe the subspace spaimed by the orthonormal 
vectors vi = (0, 1, 0) and V2 = || — ^, 0, -^-j. From Formula 12 the orthogonal projection of 
u= (1. 1, 1) on Wis 

projfjr u = (u, VI }vi + (u, V2}V2 

= (i)(o,i.o)+ 0. Ij 

\25- ' 25) 

The component of u orthogonal to W is 



projjj,^u = u-proj».u= (1, 1, 1) - L -^)= 0. §) 

Observe that proj^ i u is orthogonal to both and V2, so this vector is orthogonal to each vector in 
the space ^spanned by and V2, as it should be. 



A Geometric Interpretation of Orthogonal Projections 



If fl^ is a one-dimensional subspace of an inner product space V, say span ( a) , then Formula 1 1 has only the one 
term 



In the special case where Fis with the Euclidean inner product, this is exactly Formula 10 of Section 3.3 for the 
orthogonal projection of u along a. This suggests that we can think of 1 1 as the sum of orthogonal projections on 
"axes" determined by the basis vectors for the subspace ^(Figure 6.3.2). 




pfOJvjU^ , 



Figure 6.3.2 



The Gram-Schmidt Process 

We have seen that orthonormal bases exhibit a variety of useful properties. Our next theorem, which is the main 
result in this section, shows that every nonzero fmite-dimensional vector space has an orthonormal basis. The proof 
of this result is extremely important, since it provides an algorithm, or method, for converting an arbitrary basis into 
an orthonormal basis. 

y 

THEOREM 6.3.5 

Every nonzero fmite-dimensional inner product space has an orthonormal basis. 

ID □ 

Proof Let WhQ any nonzero fmite-dimensional subspace of an inner product space, and suppose that 
{ui, U2, u^) is any basis for W. It suffices to show that ^has an orthogonal basis, since the vectors in that basis 
can be normalized to obtain an orthonormal basis. The following sequence of steps will produce an orthogonal basis 
{vi, V2,..-, v^} for^: 



step 1. Let VI =ui. 

Step 2. As illustrated in Figure 6.3.3, we can obtain a vector V2 that is orthogonal to vi by computing the 

component of U2 that is orthogonal to the space IV i spanned by . Using Formula 1 1 to perform this 
computation we obtain 

V2 = U2 - projjjr U2 = U2 - 

llvill^ 

Of course, if V2 = 0, then V2 is not a basis vector. But this cannot happen, since it would then follow from 
the above formula for V2 that 

(U2, vij, (U2>vi} 
llvill^ ||ui||2 

which implies that U2 is a multiple of uj, contradicting the linear independence of the basis 
S= {ui,U2,...,u„) . 




Figure 6.3.3 

V3 that is orthogonal to both v\ and V2, we compute the component of U3 orthogonal 



Step 3. To construct a vector . ..xx^.. .o ^xvxx^^^xx^x >^^vxx - j „xxv. ■ ^, ^^.xxj^^v^ ^xx^ ^^xxx^^xx^xxv ^x -j ^x..xx^g,^x 
to the space W2 spanned by vj and V2 (Figure 6.3.4). Using Formula 1 1 to perform this computation we 
obtain 

t = U3 - -> ei-vi - -i r^V2 



llvill^ ||V2||2 



As in Step 2, the linear independence of (uj, U2, u„} ensures that V3 0. We leave the details for you. 




Figure 6.3.4 

Step 4. To determine a vector V4 that is orthogonal to vj, V2, and V3, we compute the component of U4 orthogonal 
to the space W2 spanned by v^, V2, and V3. From 11, 

[ = U4 - -^Yl - r^V2 - — -r-V3 



V4 = U4 — projjfT^ U4 = U4 ■ 



llvill^ ||V2||2 " ||V3||^ 



Continuing in this way we will produce an orthogonal set of vectors ( vi , V2, . . v^. ) after r steps. Since orthogonal 
sets are linearly independent, this set will be an orthogonal basis for the r-dimensional space W. By normalizing 
these basis vectors we can obtain an orthonormal basis. 



The step-by-step construction of an orthogonal (or orthonormal) basis given in the foregoing proof is called the 
Gram-Schmidt process. For reference, we provide the following summary of the steps. 

n 



The Gram-Schmidt Process 

To convert a basis {ui , U2, . - } into an orthogonal basis ( , V2, . - } , perform the following 
computations: 

Stepl. ^1="1 

step 2. ("2. vi) 

l|vi||2 

Step 3. ("3. V2) 

^ V3 = U3 - r^Vl - -i T^V2 

llvill^ ||V2||2 
step 4. («4. VI } (U4, V2) ("4. V3) 

llvill^ ||V2||2 ||V3||2 



(continue for r steps) 

Optional Step. To convert the orthogonal basis into an orthonormal basis (qi, q2, q?-) ^ normalize the 
orthogonal basis vectors. 



EXAMPLE 7 Using the Gram-Schmidt Process < 

Assume that the vector space p} has the Euclidean inner product. Apply the Gram-Schmidt process 
to transform the basis vectors 

ui = (1, 1, 1), U2 = (0, 1, 1), U3 = (0, 0, 1) 

into an orthogonal basis {vi, V2, V3} , and then normalize the orthogonal basis vectors to obtain an 
orthonormal basis (qi, q2, q3} . 

Solution 

Stepl. vi=ui = (l, 1, 1) 
Step 2. (U2, VI } 

llvill^ 



= (0,l,l)-|(l,l,l) = (-|,i ij 



step 3. 



V3 = 



U3 - projjt^j ^2 



V2 



Thus, 



llvill^ ||V2||2 

= (o,o,,)-lo,u,-l^(-|,l,l) 



„ = (,,,,,), V2=(-|.1.1). V3=(0,-il) 



form an orthogonal basis for f;-'. The norms of these vectors are 

3 



l|vill = /3. I|v2ll = -'^. l|v3ll = -)= 

1/2 



so an orthonormal basis for is 



— VI _ 
llvill 



1 1 



/3' i/3' i/3, 



_2_ _L _L] 

/6' /6' /?/ 



l|V3ll 



/2' /2, 



Remark In the last example we normalized at the end to convert the orthogonal basis into an orthonormal basis. 
Alternatively, we could have normalized each orthogonal basis vector as soon as it was obtained, thereby producing 
an orthonormal basis step by step. However, that procedure generally has the disadvantage in hand calculation of 
producing more square roots to manipulate. A more useful variation is to "scale" the orthogonal basis vectors at 
each step to eliminate some of the fractions. For example, after Step 2 above, we could have multiplied by 3 to 
produce ( — 2, 1, 1) as the second orthogonal basis vector, thereby simplifying the calculations in Step 3. 




Erhardt Schmidt (1875-1959) 

Historical Note Schmidt wasa German mathematician who studied for his doctoral degree at Gottingen 
University under David Hilbert, one of the giants of modern mathematics. For most of his life he taught at 
Berlin University where, in addition to making important contributions to many branches of mathematics, 
he fashioned some of Hilbert's ideas into a general concept, called a Hilbert space — a fundamental idea in 



the study of infmite-dimensional vector spaces.He first described the process that bears his name in a paper 
on integral equations that he published in 1907. 
[Image: Archives of the Mathematisches Forschungsinst] 



Historical Note Gram was a Danish actuary whose early education was at village schools 
supplementedby private tutoring. He obtained a doctorate degree in mathematics while working for the 
Hafnia Life Insurance Company, where he specialized in the mathematics of accident insurance.lt was in his 
dissertation that his contributions to the Gram-Schmidt process were formulated. He eventually became 
interested in abstract mathematics and received a gold medal from the Royal Danish Society of Sciences 
and Letters in recognition of his work. His lifelong interest in applied mathematics never wavered, however, 
and he produced a variety of treatises on Danish forest management. 
[Image: wikipedia] 



Jorgen Pederson Germ (1850-1916) 




CALCULUS REQUIRED 



EXAMPLES Legendre Polynomials 



Let the vector space P2 have the inner product 




Solution Take ui = 1 , U2 = :^ , and ^^ = 2:^. 
Stepl. vi=ui = l 
Step 2. We have 




so 



step 3. We have 



so 



V2 = U2 



llvill^ 



VI =U2 = x 



l|vil|2 = (vi.vi} = 



-1 
1 



2 
3 



J-1 



1 <afx = x 



= 2 



J-1 



("3. VI } ("3.V2) 2 1 

V3 = U3--i T^vi--^ r^V2 = x 



iiviir 



I|v2ir 



Thus, we have obtained the orthogonal basis (^j (x), $^2(^)' ^3(^) ) which 



Remark The orthogonal basis vectors in the foregoing example are often scaled so all three functions have a value 
of 1 at = 1 . The resulting polynomials 

which are known as the first three Legendre polynomials, play an important role in a variety of applications. The 
scaling does not affect the orthogonality. 



Extending Orthonormal Sets to Orthonormal Bases 

Recall from part (b) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional vector space can be 
enlarged to a basis by adding appropriate vectors. The following theorem is an analog of that result for orthogonal 
and orthonormal sets in finite-dimensional inner product spaces. 



THEOREM 6.3.6 

If ^ is a finite-dimensional inner product space, then: 

(a) Every orthogonal set of nonzero vectors in ^can be enlarged to an orthogonal basis for W. 

(b) Every orthonormal set in fFcan be enlarged to an orthonormal basis for W. 



m 



We will prove part (b) and leave part (a) as an exercise. 

Proof (b) Suppose that S = {v\, V2, v^} is an orthonormal set of vectors in W. Part (b) of Theorem 4.5.5 tells 
us that we can enlarge S to some basis 

S'"= {vi, V2,---, V5, V5+1,..., VA;} 

for W. If we now apply the Gram-Schmidt process to the set s\ then the vectors v\, V2, v^, will not be affected 
since they are already orthonormal, and the resulting set 

Sr" = {vi, V2, V5, Vk) 

will be an orthonormal basis for W. 



OPTIONAL 

QR-Decomposition 

In recent years a numerical algorithm based on the Gram-Schmidt process, and known as QR-decomposition, has 
assumed growing importance as the mathematical foundation for a wide variety of numerical algorithms, including 
those for computing eigenvalues of large matrices. The technical aspects of such algorithms are discussed in 
textbooks that specialize in the numerical aspects of linear algebra. However, we will discuss some of the 
underlying ideas here. We begin by posing the following problem. 

r n 



Problem 

If ^ is an ^ X ?2 matrix with linearly independent column vectors, and if Q is the matrix that results by 
applying the Gram-Schmidt process to the column vectors of A, what relationship, if any, exists between A 
and Ql 

L J 



To solve this problem, suppose that the column vectors of ^ are ui , U2, . . u^.j and the orthonormal column vectors 
of g are qi , q2, . . q^. Thus, A and Q can be written in partitioned form as 

A= [ui|u2|... |u„] andQ= [qi|q2|--- |qn] 
It follows from Theorem 632b that ui, U2, u„ are expressible in terms of the vectors qi, q2, q« as 
^1 = (ui,qi}qi (ui,q2)q2 4" " ' 4 (ui, q„}q„ 

^2 = (112, qi}qi + {^2^m)m +• • • + (U2, q„}q„ 

! : I : 

«M = qi}qi ^ (^M. q2}q2 4= - • • + (u„, q„}q„ 

Recalling from Section 1.3 (Example 9) that the yth column vector of amatrix product is a linear combination of the 
column vectors of the first factor with coefficients coming from the yth column of the second factor, it follows that 
these relationships can be expressed in matrix form as 

(iil,qi} (112, qi} qi} 

{^h^2) {^%^2) (Um, q2} 

{^u^Yi) (^2, q«} q«} 



[ui|u2|--- |u„] = [qi|q2|--- |q«] 



or more briefly as 



A = QR 



(14) 



where R is the second factor in the product. However, it is a property of the Gram-Schmidt process that for j >2, 
the vector <!; is orthogonal to ui, U2, Thus, all entries below the main diagonal of i? are zero, and R has the 
form 



R = 



(ui,qi} (U2, qi} 
0 (12, q2} 

0 0 



• • • (»«. qi} 

• • • (««, q2} 

• • • q«} 



(15) 



We leave it for you to show that R is invertible by showing that its diagonal entries are nonzero. Thus, Equation 14 
is a factorization of A into the product of a matrix Q with orthonormal column vectors and an invertible upper 
triangular matrix R. We call Equation 14 the QR-decomposition of A. In summary, we have the following theorem. 



THEOREM 6.3.7 QR-Decomposition 

If ^ is an ^ X « matrix with linearly independent column vectors, then A can be factored as 

A=QR 

where 2 is an ^ x ^2 matrix with orthonormal column vectors, and is an ^ x « invertible upper triangular 
matrix. 



It is common in numerical linear algebra to say 
that a matrix with linearly independent columns 
has full column rank. 



Recall from Theorem 5.1.6 (the Equivalence Theorem) that a square matrix has linearly independent column 
vectors if and only if it is invertible. Thus, it follows from the foregoing theorem that every invertible matrix has a 
QR-decomposition . 

EXAMPLE 9 QR-Decomposition of a 3 x 3 Matrix < 



Find the Qi^-decomposition of 





"1 


0 


0 


A = 


1 


1 


0 




1 


1 


1 



Solution The column vectors of A are 



r 




"0" 




"0" 


1 


. «2 = 


1 


. «3 = 


0 


1 




1 




1 



Applying the Gram-Schmidt process with normalization to these column vectors yields the 



orthonormal vectors (see Example 7) 



qi = 



1 




2 








1 




1 




> q2 = 


/? 


1 




1 









Thus, it follows from Formula 1 5 that R is 



R = 



(Ui,qi} (U2, qi} (U3, qi} 
0 (U2, q2) (113. q2} 
0 0 (U3, q3} 



q3 = 



0 

_J_ 

1 



J 2 L 

^3/3/3 

0 ^ ^ 
0 0-1= 



/5 



Show that the matrix Q in Example 9 has 
the property QQ * = /, and show that every 

m xft matrix with orthonormal column 
vectors has this property. 

from which it feiUows that the gi?-decomposition of ^ is 

1 2 



1 0 0 
1 1 0 
1 1 1 



0 

1 

f2 



/3 /6 
1 1 

_L _L _L 

{3 {1 {2 

Q 



3 2 1 

{3 {3 {3 

0 ^ 

0 0 

/2 



Concept Review 

• Orthogonal and orthonormal sets 

• NormaUzing a vector 

• Orthogonal projections 

• Gram-Schmidt process 

• gi?-decomposition 

Skills 



• Determine whether a set of vectors is orthogonal (or orthonormal). 

• Compute the coordinates of a vector with respect to an orthogonal (or orthonormal) basis. 

• Find the orthogonal projection of a vector onto a subspace. 

• Use the Gram-Schmidt process to construct an orthogonal (or orthonormal) basis for an inner product 
space. 

• Find the gi?-decomposition of an invertible matrix. 



Exercise Set 6.3 

1. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on 
(a) (0,1), (2,0) 

[ {2' {iy\{i' {2} 



(d) (0,0), (0,1) 



Answer: 



(a), (b), (d) 

2. Which of the sets in Exercise 1 are orthonormal with respect to the Euclidean inner product on 

3. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on /?^? 



/2' ' {3' v''2' ' 1/2 J 



(a) ( 1 JL 



(b) (2 _2 X] (2 1 _2\ a 2 2\ 

\3' 3'3M3'3' 3/\3'3'3j 



(d) 



1 1 



1 



1 



. 0 



Answer: 



(b), (d) 

4. Which of the sets in Exercise 3 are orthonormal with respect to the Euclidean inner product on /J^? 

5. Which of the following sets of polynomials are orthonormal with respect to the inner product on P2 discussed in 
Example 7 of Section 6.1 ? 

(a) pi(;,) = I - + Y, P2{^) = I + - f P^i^) = 3 + 3^+3^^ 



(b) PI (X) = 1. P2{x) = -J=x + -J=x2, p^{x) = x2 



Answer: 



(a) 



6. Which of the following sets of matrices are orthonormal with respect to the inner product on Jl/22 discussed in 
Example 6 of Section 6.1 ? 

(a) . , r 0 I 



(b) 



[0 0]' 

[0 oj' [0 0/ 







0 • 












3 










2 


1 




2 2 






3 


3 




3 3 


ro 


0" 




"0 ( 




1 


1 


9 


1 







7. Verify that the given vectors form an orthogonal set with respect to the Euclidean inner product; then convert it 
to an orthonormal set by normalizing the vectors. 

(a) (-1.2), (6, 3) 

(b) (1.0. -1), (2. 0.2), (0.5.0) 

(c) fi i iW_i i ol (1 i -2^ 

Answer: 

(a) / L 2_\ lj_ _l1 

r ^- ^j- [f5- fsj 

8. Verify that the set of vectors { (1, 0), (0, 1) } is orthogonal with respect to the inner product 
(u, vj = 4«ivi + U2V2 on g^; then convert it to an orthonormal set by normalizing the vectors. 



9. Verify that the vectors 



VI = ( - ^} '^^2 = f ' = (0, 0. 1) 



form an orthonormal basis for with the Euclidean inner product; then use Theorem 6.3.26 to express each of 
the following as linear combinations of vj, V2, and V3. 

(a) (1, -1.2) 

(b) (3, -7,4) 

(c) (1 -11] 



Answer: 



(a) _2vi+iv2 + 2v3 

(b) _^vi - •|v2 + 4v3 



(c) _lvi_iv2 + |v3 

10. Verify that the vectors 



VI = 0. -1,2, -1), T2=(-2,2,3,2), 
V3 = (1.2,0, -1), V4=(1.0,0,l) 

form an orthogonal basis for /J^ with the EucUdean inner product; then use Theorem 6.3.2a to express each of 
the following as linear combinations of vi, V2, V3, and V4. 

(a) (1. 1. 1. 1) 

(b) (/2. -3/2,5/2, -/2) 

(c) _i 2 _i 4\ 
\ 3' 3' 3'3j 

(a) Show that the vectors 

VI = (1,-2,3,-4), V2 = (2,1,-4,-3), 
V3 = (-3,4,1, -2), V4 = (4,3,2.1) 

form an orthogonal basis for with the Euclidean inner product. 

(b) Use Theorem 6.3.2a to express a =( — 1, 2, 3, 7) as a linear combination of the vectors in part (a). 

Answer: 

(b) u = _ ivi - |iv2 + 0V3 + 

In Exercises 12-13, an orthonormal basis with respect to the Euclidean inner product is given. Use Theorem 6.3.2b 
to find the coordinate vector of w with respect to that basis. 

w=(3,7);ui= -p:, - -= L U2 = "7=, -7= 
(b) «r= ( - 1, 0, 2); ui = - ^. Ij, U2 = (|. \, - U3 = |, 1] 

(a) (2. 0. 5). = (f . 1 I). U2 = [\, \. - 1], U3 = (|. - 1. - i) 
( - 1, 1, 2); = f-^, ^ 1 ] , „2 = (--L. 2 . _Li 

«3=f- ' ' 



^66' ^66' ^66^ 



Answer: 



(a),r=^ni-f«2-iu3 



(b) ^=J=„2+_1L, 

/6 /66 



In Exercises 14-15, the given vectors are orthogonal with respect to the Euclidean inner product. Find projjp'X, 
where x = (1, 2, 0, — 2) and Wis the subspace of spanned by the vectors. 

14. (a) vi = (l, 1,1,1), V2 = (1,1, -1. -1) 
(b) VI = (0,1. -4, -1),V2 = (3,5, 1, 1) 

15. (a) vi = (l. 1. 1. 1),V2 = (1.1. -1. -1),V3 = (1. - 1. 1. -1) 
(b) VI = (0. 1. -4. -1),V2 = (3,5. 1. 1),V3 = (1.0. 1. -4) 



Answer: 



(a) (1 5 _3 _5\ 

\4' 4' 4' 4) 

(b) /il 7 _ J_ _23\ 
U2'4' 12' 12 J 

In Exercises 16-17, the given vectors are orthonormal with respect to the Euclidean inner product. Use Theorem 
6.3.46 to find projfjrx, where x = (1, 2, 0, — 1) and Wis the subspace of spanned by the vectors. 



16. 



(b) 



-(»-7k-#-7k)-(H.H) 

^ U'2'2'2j'^ [r 2' 2' 2) 



17. 



(a) 



^ U'2'2'2j'^ U'2' 2' 2/^ U' 2' 2' 2j 



Answer: 



(a) /23 li _X .iZl 
US' 6 ' 18' 18 j 

(b) /I 3 .1 . n 

U'2' 2' 2) 

18. In Example 6 of Section 4.9 we found the orthogonal projection of the vector x = (1, 5) onto the line through 
the origin making an angle of jj- / 6 radians with the x-axis. Solve that same problem using Theorem 6.3.4. 

19. Find the vectors wi in Wand W2 in W"^ such that x=wi +W2, where x and ^are as given in 

(a) Exercise 14(a). 

(b) Exercise 15(a). 



Answer: 



(a) ., = (11 .,= (.11, 

(b) .,= 7 5 _3 _5^ ^„=f_l 1 1 _1^ 

""1 ^'4' 4' 4/ ^ 1^ 4' 4' 4' 4) 

20. Find the vectors in and W2 in J^' ' such that x = wi H-W2, where x and are as given in 

(a) Exercise 16(a). 

(b) Exercise 17(a). 

21. Let have the Euclidean inner product. Use the Gram-Schmidt process to transform the basis {uj, U2) into 
an orthonormal basis. Draw both sets of basis vectors in the xy-plane. 

(a) »1 = (1. -3), U2 = (2, 2) 

(b) ui = (1.0). U2=(3, -5) 

Answer: 



/To' /To^ 



V2 = 



/To' /To, 



(b) VI = (1,0), V2 = (0, -1) 







1 




1 




-I 


^ ■ ■ 


-1 





-I 



X 



22. Let have the Euclidean inner product. Use theGram-Schmidt process to transform the basis {uj, U2, U3} 
into an orthonormal basis. 

(a) ui = (1, 1. 1), U2 = ( - 1. 1. 0), U3 = (1. 2. 1) 

(b) ui = (1, 0. 0), U2 = (3, 7. - 2), U3 = (0, 4. 1) 

23. Let have the Euclidean inner product. Use the Gram-Schmidt process to transform the basis 
{ui, U2, U3, U4} into an orthonormal basis. 



ni = (0,2.1,0). U2 = (l. -1,0.0), 
U3 = (1.2.0, -1). U4=(1.0,0,l) 



Answer: 



VI = [o, ■j=, -)=, o], V2 = [-^, - o], 
\ ^ ^ j /30 /30 I 



V3 = 



1 1 



2 2 „ 1 1 



2 3 



^/lo' /To' /To' /To/ ^/Ts' /Ts' /Ts'/TIj 

24. Let /j3 have the Euclidean inner product. Find an orthonormal basis for the subspace spanned by (0, 1,2), 
(-1.0, 1),(-1, 1,3). 

25. Let have the inner product 

(u, v} = «ivi + 2«2V2 + 3«3V3 

Use the Gram-Schmidt process to transform ai = (1, 1, 1), a2 = (1, 1, 0), 03 = (1, 0, 0) into an orthonormal 
basis. 

Answer: 

„ = (i, -L, -L]. {+.+.- +]. »3 = . - +. 0] 

Let have the Euclidean inner product. The subspace of /J^ spanned by the vectors Qi = 0, — and 

U2 = (0, 1 , 0) is aplane passing through the origin. Express Hr= (1, 2, 3) in the form wr= Wi + W2, where Vl\ 
lies in the plane and W2 is perpendicular to the plane. 

27. Repeat Exercise 26 with aj = (1, 1, 1) and U2 = (2, 0, — 1). 
Answer: 

/II li 40 A _ /J L JL 1 

^ U4'14'14/^ \W 14' 14 J 

28. Let have the Euclidean inner product. Express the vector Hr= ( — 1, 2, 6, 0) in the form Hr= wi + W2, 
where wi is in the space spanned by ai = ( — 1, 0, 1,2) and 02 = (0, 1, 0, 1), and is orthogonal to W. 

29. Find the ^/Z-decomposition of the matrix, where possible. 



(a) 




r 








(b) 


"1 2 






0 1 






1 4 




(c) 


1 


r 




-2 


1 




2 


1 


(d) 


'1 0 


2 




0 1 


1 




1 2 


0 



(e) 



(f) 



1 2 1 
1 1 1 

0 3 1 



1 


0 


1 


-1 


1 


1 


1 


0 


1 


-1 


1 


1 



Answer: 




(f) Columns not linearly independent 



In Step 3 of the proof of Theorem 6.3.5, it was stated that "the linear independence of {ui, U2, u„) ensures 
that V3 ^ 0." Prove this statement. 

Prove that the diagonal entries of R in Formula 15 are nonzero. 

Calculus required Use Theorem 6.3.2a to express the following polynomials as linear combinations of the first 
three Legendre polynomials (see the Remark following Example 8). 

(a) \+x + Ax^ 

(b) 2-7x2 

(c) 4 + 37: 

Calculus required Let P2 have the inner product 

P. q = / p{x)q{x)dx 

I 7o 

I I 

Apply the Gram-Schmidt process to transform the standard basis into an orthonormal basis. 

Answer: 

VI = 1, V2 = {3{2x - 1), V3 = {5{ex^ - 67: + 1) 
34. Find vectors x and y in that are orthonormal with respect to the inner product (u, = 3wivi + 2u2V2 but 
are not orthonormal with respect to the Euclidean inner product. 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) Every linearly independent set of vectors in an inner product space is orthogonal. 
Answer: 

False 

(b) Every orthogonal set of vectors in an inner product space is linearly independent. 
Answer: 

False 

(c) Every nontrivial subspace of has an orthonormal basis with respect to the Euclidean inner product. 
Answer: 

True 

(d) Every nonzero finite-dimensional inner product space has an orthonormal basis. 
Answer: 

True 

(e) projv^ X is orthogonal to every vector of W. 
Answer: 



False 

(f) If ^ is an pa X « matrix with a nonzero determinant, then A has a gT^-decomposition. 
Answer: 
True 
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6.4 Best Approximation; Least Squares 



In this section we will be concerned with linear systems that cannot be solved exactly and for which an approximate solution is 
needed. Such systems commonly occur in applications where measurement errors "perturb" the coefficients of a consistent system 
sufficiently to produce inconsistency. 



Least Squares Solutions of Linear Systems 

Suppose that Ax = b is an inconsistent linear system of m equations in n unknowns in which we suspect the inconsistency to be 
caused by measurement errors in the coefficients of ^. Since no exact solution is possible, we will look for a vector x that comes as 
"close as possible" to being a solution in the sense that it minimizes ||b — ^|| with respect to the Euclidean inner product onR^. 
You can think of Jix as an approximation to b and ||b — as the error in that approximation — the smaller the error, the better 
the approximation. This leads to the following problem. 



Least Squares Problem 

Given a linear system Ax = h equations in n unknowns, find a vector x that minimizes ||b — Ax\\ with respect to the 
Euclidean inner product on We call such an x a least squares solution of the system, we call b — Ax the least squares 
error vector, and we call ||b — Asc\\ the least squares error. 

L J 



To clarify the above terminology, suppose that the matrix form of b — Ax. is 



^2 



2 2 2 

The term "least squares solution" results from the fact that minimizing ||b — Ax\\ also minimizes ||b — Ax\\ =e^ 



si 



Best Approximation 



Suppose that b is a fixed vector in p} that we would like to approximate by a vector w that is required to lie in some subspace W 
oip}. Unless b happens to be in W, then any such approximation will result in an "error vector" b — w that cannot be made equal 
to 0 no matter how w is chosen (Figure 6.4.1a). However, by choosing 

we can make the length of the error vector 

||b-w|| = ||b-projf^b|| 

as small as possible (Figure 6.4. IZ?). 

\P 




b-w 



(P) 




Figure 6.4.1 



These geometric ideas suggest the following general theorem. 
THEOREM 6.4.1 Best Approximation Theorem 

If ^ is a fmite-dimensional subspace of an inner product space V, and if b is a vector in V, then proj^^ b is the best 
approximation to b from W in the sense that 

||b-projjj,b||<||b-w|| 

for every vector w in ^that is different from projj^ b. 

m 

Proof For every vector w in W, we can write 

b -w= (b -projjjr b) + (proj^^ b -w) (1) 

But projp^ b — w being a difference of vectors in ^is itself in W; and since b — proj^ b is orthogonal to W, the two terms on the 
right side of 1 are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that 

lib -w||2 = lib -projjp b||2 + ||proj», b -w||2 

Since w ^ proj^ b, it follows that the second term in this sum is positive, and hence that 

l|b-proj».b||2<||b-w||2 

Since norms are nonnegative, it follows (from a property of inequalities) that 

||b-proj^b||<||b-w|| 



Least Squares Solutions of Linear Systems 

One way to find a least squares solution of ^ = b is to calculate the orthogonal projection proj^^ b on the column space ^of the 
matrix A and then solve the equation 

Ax = proij^h (2) 

However, we can avoid the need to calculate the projection by rewriting 2 as 

b — = b — projf^ b 
and then multiplying both sides of this equation by ^ ^ to obtain 

^^(b - ^) = ^^(b - projp^ b) (3) 

Since b — proj^^ b is the component of b that is orthogonal to the column space of ^, it follows from Theorem 4.8.9Z? that this 
vector lies in the null space of ^4^, and hence that 

.4^(b-projp^ b) = 0 

Thus, 3 simplifies to 

^^(b-^) = 0 

which we can rewrite as 



A'^Ax: = A'^h (4) 



This is called the normal equation or the normal system associated with ^ = b- When viewed as a linear system, the individual 
equations are called the normal equations associated with ^ = b- 

In summary, we have established the following result. 



THEOREM 6.4.2 

For every linear system ^ = b? the associated normal system 

A^Aii = A^)i (5) 

is consistent, and all solutions of 5 are least squaressolutions of ^ = b- Moreover, if ^is the column space of^, and x is 
any least squares solution of ^ = b, then the orthogonal projection of b on ^is 

projf^ b = ^ (6) 



If a linear system is consistent, then its exact solutions are 
the same as its least squares solutions, in which case the 
error is zero. 



EXAMPLE 1 Least Squares Solution < 



(a) Find all least squares solutions of the linear system 

- X2 = A 

3x\ I 27:2 = 1 

-2x\ A- Ax2 = 3 

(b) Find the error vector and the error. 



Solution 

(a) It will be convenient to express the system in the matrix form Ax = h^ where 





1 


-1" 




"4" 


A = 


3 


2 


and b = 


1 




-2 


4 




3 



It follows that 



1 3 -2 
-12 4 



1 -1 

3 2 
-2 4 



-[ 



Ah: 



SO the normal system ^ = ^ 



14 -3 

-3 21 









[4 








1 3 


-2' 








-1 2 


4_ 


1 








3 




" 14 -3' 






r 




3 21_ 


/2_ 




_10_ 



10 



Solving this system yields a unique least squares solution, namely, 

17 „ 143 



(b) The error vector is 





'4' 




1 


-1 




1 




3 


2 




3 




-2 


4 



and the error is 



95 
143 

285 



lib - ^11 w 4.556 



92 " 




1232 " 


285 




285 


439 




154 


285 




285 


95 




4 


57 




3 



EXAMPLE 2 Orthogonal Projection on a Subspace M 

Find the orthogonal projection of the vector u=( — 3, — 3, 8, 9) on the subspace off^^ spanned by the vectors 
ui = (3, 1,0,1), U2 = (l,2, 1,1), U3 = (-l,0,2, -1) 

Solution We could solve this problem by first using the Gram-Schmidt process to convert {ui , U2, U3 } into an 
orthonormal basis and then applying the method used in Example 6 of Section 6.3 . However, the following method 
is more efficient. 



The subspace Wof spanned by uj, uj, and U3 is the column space of the matrix 



A = 



1 -1 

2 0 
1 2 
1 -1 



Thus, if u is expressed as a column vector, we can find the orthogonal projection of u on Wby finding a least 
squares solution of the system Ax. = u and then calculating projjjr u = Ax: from the least squares solution. The 
computations are as follows: The system j4x = u is 

3 1 -1" 

1 2 0 

0 1 2 

1 1 -1 



^1 
^3 



-3 
-3 

8 
9 



so 



A^u 



3 1 0 
1 2 1 
-10 2 

3 1 0 
1 2 1 
-10 2 



1 2 
0 1 



1 -1 

0 
2 



1 1 -1 



11 6 -4 
6 7 0 
-4 0 6 



-3 






-3 






8 




10 


1 9 





The normal system ^ = A^\l^^ this case is 



"11 


6 


-4" 


■^1" 




"-3" 


6 


7 


0 






8 


-4 


0 


6 


/3_ 




10 



Solving this system yields 









"-1" 


X = 






2 




^3 




1 



as the least squares solution of ^ = u (verify), so 





'3 


1 


-l" 


"-1" 

2 
1 




-2" 


projffr u = Ax = 


1 

0 


2 
1 


0 
2 




3 
4 




1 


1 


-1 




0 



or, in comma-delimited notation, projjfr u = ( — 2, 3, 4, 0). 



Uniqueness of Least Squares Solutions 

In general, least squares solutions of linear systems are not unique. Although the linear system in Example 1 turned out to have a 
unique least squares solution, that occurred only because the coefficient matrix of the system happened to satisfy certain conditions 
that guarantee uniqueness. Our next theorem will show what those conditions are. 

THEOREM 6A3 

If A is an y ^ matrix, then the following are equivalent. 

(a) A has linearly independent column vectors. 

(b) ^'^^ is invertible. 

u □ 

Proof We will prove that {a) ^ (6) and leave the proof that (b) ^ {a) as an exercise. 

(a) ^ (b) Assume that A has linearly independent column vectors. The matrix j{ has size n'xn^'^^ we can prove that this 
matrix is invertible by showing that the linear system j{ '^Ax. = 0 has only the trivial solution. But if x is any solution of this 
system, then is in the null space of and also in the column space of ^. By Theorem 4.8.9Z? these spaces are orthogonal 
complements, so part {b) of Theorem 6.2.4 implies that Ax = 0- But^ is assumed to have linearly independent column vectors, so 
X = 0 by Theorem 1.3.1. 

As an exercise, try using Formula 7 to solve the problem 
in part (a) of Example 1 . 

The next theorem, which follows directly from Theorem 6.4.2 and Theorem 6.4.3, gives an explicit formula for the least squares 
solution of a linear system in which the coefficient matrix has linearly independent column vectors. 

n 

THEOREM 6.4.4 

If A is an y^i x n matrix with linearly independent column vectors, then for every m x.\ matrix b, the linearsystem ^ = b 
has a unique least squares solution. This solution is given by 

x= {A^Ay^A^h (7) 



Moreover, if ^is the column space of A, then the orthogonalprojection of b on ^is 

,-1 



projj^b = ^ = ^(^^^) ^^b (8) 



OPTIONAL 

The Role of QR-Decomposition in Least Squares Problems 

Formulas 7 and 8 have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of 
^ = b are typically found by using some variation of Gaussian elimination to solve the normal equations or by using 
gi?-decomposition and the following theorem. 



THEOREM 6.4.5 

If A is an x n matrix with linearly independent column vectors, and if ^ = QR is a g^-decomposition of A (see Theorem 
6.3.7), then for each b in the system Ax = h has a unique least squares solution given by 

x = R~^Qh (9) 



A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However, 
you can obtain Formula 9 by making the substitution A = QR in 7 and using the fact that Q g = / to obtain 



= [(QR)^(QR)) \QR)h 



= [R^Q^QRy\QR)h 
= R-^(R'^y^R'^Qh 



Orthogonal Projections on Subspaces of R 



In Section 4.8 we showed how to compute orthogonal projections on the coordinate axes of a rectangular coordinate system in R^ 
and more generally on lines through the origin of R^. We will now consider the problem of finding orthogonal projections on 
subspaces of R^. We begin with the following definition. 

r n 
DEFINITION 1 

If ^ is a subspace of then the linear transformation P:R^' -W that maps each vector x'mR^ into its orthogonal 
projection proj^^ x in ^is called the orthogonal projection ofR^ on W 
L J 



It follows from Formula 7 that the standard matrix for the transformation P is 



[P] =a(a'^a^ 

where A is constructed using any basis for Wsls its column vectors. 

EXAMPLE 3 The Standard Matrix for an Orthogonal Projection on a Line M 

We showed in Formula 16 of Section 4.9 that 



(10) 



P0= 



2 

COS 9 sinOcosO 
sin 9 cos 9 sin^ 9 

is the standard matrix for the orthogonal projection on the line ^through the origin of that makes an angle 6 with 
the positive x-axis. Derive this result using Formula 10. 

Solution The column vectors of ^ can be formed from any basis for W. Since ^is one-dimensional, we can take 
w= (cos 9, sin 0) as the basis vector (Figure 6.4.2), so 

cos 9 



A = 



sm9 



We leave it for you to show that is the 1 x 1 identity matrix. Thus, Formula 10 simplifies to 



[P] =a(a'^a^ ^a'^=aa'^= 



2 

COS 9 sin9 cos 9 
sin 9 cos 9 sin 0 



cos 6 
sin^ 

=P0 



[cos 9 sm9] 



y 



w 




Figure 6.4.2 



Another View of Least Squares 

Recall from Theorem 4.8.9 that the null space and row space of an ^ x « matrix^ are orthogonal complements, as are the null 
space of ^ ^ and the column space of A. Thus, given a linear system Ax = b in which ^ is an ^ x « matrix, the Projection 
Theorem (6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal terms as 

x = Xfow(4) i ^nu\\(j^ and ^ = bj^^^^r| I bcoi(4) 

where ^tovv(J^ and ^n\ii\(j^ are the orthogonal projections of x on the row space of A and the null space of A, and the vectors 
'^niUllyl^ and bcol(4) are the orthogonal projections of b on the null space of and the column space of A. 

In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in and on which we indicated the 
orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.) 
The figure shows ^ as a point in the column space of A and conveys that b co](A} is the point in col(^) that is closest to b. This 



illustrates that the least squares solutions of ^ = b are the exact solutions of the equation Ax. = \^zoXJ^- 

n\x\\{A) coM) 




'coK>4) 



Figure 6.4.3 



More on the Equivalence Theorem 

As our final result in the main part of this section we will add one additional part to Theorem 5.1.6. 

THEOREM 6.4.6 Equivalent Statements 

If ^ is an ^ X « matrix, then the following statements are equivalent. 

(a) ^ is invertible. 

(b) i4x = 0 has only the trivial solution. 

(c) The reduced row echelon form of ^ is /„. 

(d) A is expressible as a product of elementary matrices. 
(^) j4x = b is consistent for every ^ x 1 matrix b. 

(f) Ax. = h has exactly one solution for every ^ x 1 matrix b. 

(g) det(^)*0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 
(j) The column vectors of A span /j". 

(k) The row vectors of ^ span R^. 

(I) The column vectors of A form a basis for R^. 

(m) The row vectors of ^ form a basis for 

(n) ^hasrank^. 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of ^ is 

(q) The orthogonal complement of the row space of^ is {0} . 

(r) The range of Tj\ is R^. 

(s) Tj\ is one-to-one. 

(t) \ = 0 is not an eigenvalue of ^. 

(u) A '^A is invertible. 



The proof of part {u) follows from part Qi) of this theorem and Theorem 6.4.3 applied to square matrices. 



OPTIONAL 



We now have all the ingredients needed to prove Theorem 6.3.3 in the special case where Vis the vector space R^. 

Proof of Theorem 6.3.3 We will leave the case where W= (0) as an exercise, so assume that (0) . Let 
{ vj , V2, - - v;^ ) be any basis for W, and form the x A: matrix M that has these basis vectors as successive columns. This makes 
WthQ column space of M and hence W ' the null space of m"^- We will complete the proof by showing that every vector uin 
can be written in exactly one way as 

a = wi -f W2 

where is in the column space of M and ][{ '^y^2 = 0- However, to say that is in the column space of M is equivalent to saying 
w\ = Mx for some vector x in R^, and to say that ^^^2 = 0 equivalent to saying that j^f -^(^ _ = 0- Thus, if we can 
show that the equation 

M'^(u-Mx) = 0 (11) 

has a unique solution for x, then wj = Mx and W2 = x — wj will be uniquely determined vectors with the required properties. To 
do this, let us rewrite 1 1 as 

M'^Mx=M\ 

Since the matrix M has linearly independent column vectors, the matrix ^^Jl^ is invertible by Theorem 6.4.6 and hence the 
equation has a unique solution as required to complete the proof 



Concept Review 

• Least squares problem 

• Least squares solution 

• Least squares error vector 

• Least squares error 

• Best approximation 

• Normal equation 

• Orthogonal projection 

Skills 

• Find the least squares solution of a linear system. 

• Find the error and error vector associated with a least squares solution to a linear system. 

• Use the techniques developed in this section to compute orthogonal projections. 

• Find the standard matrix of an orthogonal projection. 



Exercise Set 6.4 

1. Find the normal system associated with the given linear system. 



(a) 



"1 


-1" 






2" 












2 


3 




-1 










4 


5 






5 



(b) 



2-10 

3 1 2 

-1 4 5 

1 2 4 



^1 
X3 



-1 

0 
1 
2 



Answer: 



(a) 
(b) 



21 25 
25 35 
15-1 5 
-1 22 30 
5 30 45 







"20" 






_20_ 



r^i" 




"-1" 






9 


[^3 




13 



In Exercises 2-A, find the least squares solution of the linear equation = b- 



2. 



(a) 



A = 



(b) 



A = 



3. 



(a) 



(b) 



A = 



Answer: 



1 -1 

2 3 

4 5 

2 -2 
1 1 

3 1 



1 1 
-1 1 

-1 2 

1 0 -1 

2 1 -2 
1 1 0 
1 1 -1 





2" 
-1 
5 


;b = 




2 
-1 
1 


;b = 


;b = 


7" 
0 
-7 



(a) XI =5. X2 = ^ 

(b) xi = 12, X2 = - 3, JC3 = 9 

4- (a) 

i4 = 



(b) 



3 


2 


-1 




2 


1 


-4 


3 


;b = 


-2 


1 


10 


-7 




1 


2 


0 


-l" 




'0" 


1 


-2 


2 


;b = 


6 


2 


-1 


0 


0 


0 


1 


-1 




6 



A = 



In Exercises 5-6, find the least squares error vector e = b — j4x resulting from the least squares solution x and verify that it is 
orthogonal to the column space of A. 

^' (a) A and b are as in Exercise 3(a). 
(b) A and b are as in Exercise 3(b). 



Answer: 



(a) 



(b) 



e = 



2 
-3 

3' 
-3 
0 
3 

^' (a) ^ and b are as in Exercise 4(a). 
(b) A and b are as in Exercise 4(b). 

7. Find all least squares solutions of ^ = b andconfirm that all of the solutions have the same error vector. Compute the least 
squares error. 



(a) 



A = 



(b) 



A = 



(c) 



A = 



Answer: 



2 1 
4 2 

-2 1 

1 3 
-2 -6 

3 9 

-13 2 

2 1 3 
0 1 1 





'3' 


;b = 


2 




1 





'r 




0 




1 





r 


;b = 


0 




-7 



Solution: x= ^-j^, -^-j; least squares error: ^}f^ 
(^) Solution: x = 0 j ^ (— 3, 1) a real number); least squares error: y 1^42 
(^) Solution: x = ||" g"* g'*'^)"*"^^"^' — 1,1) a real number); least squares error: ^^294 

8. Find the orthogonal projection of u on the subspace of spanned by the vectors Vi and V2. 

(a) u=(2.1,3); Ti = (1. 1. 0). T2 = (1.2.1) 

(b) a=(l. -6.1); v, = (- 1.2.1). T2 = (2.2.4) 

9. Find the orthogonal projection of u on the subspace of spanned by the vectors vi, V2, and V3. 

(a) a= (6. 3. 9. 6); VI = (2. 1. 1. 1),V2 = (1.0. l.l),V3 = (-2. -1.0. -1) 

(b) a=(-2.0.2.4);v, = (l. 1.3.0), V2 = (-2. -1. -2. 1), V3 = (-3. -1.1.3) 

Answer: 

(a) (7, 2, 9, 5) 

(b) f_i2 _4 12 16\ 
[ 5 • 5* 5 • 5 J 

10. Find the orthogonal projection of u = (5. 6. 7. 2) on the solution space of the homogeneous linear system 

XI + 12+ '3 =0 
2x2+X3+X4=0 

In each part, find det^j4ri4j, and apply Theorem 6.4.3 to determine whether^ has linearly independent column vectors. 



(a) 



A = 



(b) 



A = 



-1 


3 


2 


2 


1 


3 


0 


1 


1 


2 


-1 


0 




1 


-1 




0 


4 


-5 



Answer: 

(a) det (j4 j4) = 0; ^ does not have linearly independent column vectors. 

(b) det {A A) A does not have linearly independent column vectors. 

12. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection P\R^ onto 

(a) the X-axis. 

(b) thej;-axis. 

[Note: Compare your results to Table 3 of Section 4.9.] 

13. Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection P\I^ onto 

(a) thexz-plane. 

(b) thejz-plane. 

[Note: Compare your results to Table 4 of Section 4.9.] 



Answer: 

(a) 

[P] = 



(b) 



[P] = 



1 0 0 

0 0 0 

0 0 1 

0 0 0 

0 1 0 

0 0 1 



14. Show that if w= (a, b, c) is a nonzero vector, then the standard matrix for the orthogonal projection of on the line 
span{w} is 



1 



a ab ac 
ab b^ be 
ac be 



15. Let ^be the plane with equation 57: — 3y +z = 0- 

(a) Find a basis for W. 

(b) Use Formula 10 to fmd the standard matrix for the orthogonal projection on W. 

(c) Use the matrix obtained in part (b) to fmd the orthogonal projection of a point PqC^O' 70^ ^o) 

(d) Find the distance between the point P^(\, — 2, 4) and the plane W, and check your result using Theorem 3.3.4. 

Answer: 



(a) (1.0. -5). (0.1.3) 



(b) 



10 15 -5 
15 26 3 
-5 3 34 



(c) / 27:o-|-3>^o-zo 157:0 + 2670 + 3zo -57:o + 3^0 + 34zo 
^ 7 ' 35 ' 35 

(d) 3/35 

7 

16. Let Who the line with parametric equations 

x = 2t, y= - t. z = At 

(a) Find a basis for W. 

(b) Use Formula 10 to fmd the standard matrix for the orthogonal projection on W. 

(c) Use the matrix obtained in part (b) to find the orthogonalprojection of a point PqC^O' 70^ -^o) 

(d) Find the distance between the point P^{2, 1, — 3) and the line W. 

17. In consider the line / given by the equations 

x = t, y=t, z=^i 

and the line m given by the equations 

x—s, y — 2s—\, z=\ 

Let P be a point on /, and let g be a point on m. Find the values of t and s that minimize the distance between the lines by 
minimizing the squared distance ||-P — ' 2|| . 

Answer: 
s = t = \ 

18. Prove: If A has linearly independent column vectors, and if ^ = b is consistent, then the least squares solution of ^ = b and 
the exact solution of ^ = b are the same. 

19. Prove: If ^ has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares 
solution of ^ = b is x = 0- 

20. Let P:i?'" — ► be the orthogonal projection of onto a subspace W. 

(a) Prove that {P\^ = \P\. 

(b) What does the result in part (a) imply about the composition P o -P? 

(c) Show that [P] is symmetric. 

21. Let^ be an ^ X « matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of 
onto the row space of^. \Hint: Start with Formula 10.] 

Answer: 

[P] =A^{AA^)~''a 

22. Prove the implication (b) (a) of Theorem 6.4.3. 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 

(a) If ^ is an ^ X « matrix, then ^ is a square matrix. 
Answer: 

True 

(b) If is invertible, then^ is invertible. 
Answer: 



False 



(c) If A is invertible, then A A is invertible. 
Answer: 

True 

(d) If i4x = b is a consistent linear system, then Jl^jix = ^'^b is also consistent. 
Answer: 

True 

(e) If = b is an inconsistent linear system, then A^Ax, = j4^b is also inconsistent. 
Answer: 

False 

(f) Every linear system has a least squares solution. 
Answer: 

True 

(g) Every linear system has a unique least squares solution. 
Answer: 

False 

(h) If ^ is an X » niatrix with linearly independent columns and b is in R^, then ^ = b has a unique least squares solution. 
Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



6.5 Least Squares Fitting to Data 

In this section we will use results about orthogonal projections in inner product spaces to obtain a technique 
for fitting a line or other polynomial curve to a set of experimentally determined points in the plane. 



Fitting a Curve to Data 

A common problem in experimental work is to obtain a mathematical relationship y = f (x) between two 
variables x andy by "fitting" a curve to points in the plane corresponding to various experimentally 
determined values of x and jf, say 



On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter 
decides on the general form of the curve y = f {x) to be fitted. Some possibilities are (Figure 6.5.1) 

(a) A straight line: y=ia^bx 

(b) A quadratic polynomial: y = a^bx^ cx 

(c) A cubic polynomial: y = a^bx ^cx^dx 

Because the points are obtained experimentally, there is often some measurement "error" in the data, making 
it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose 
the curve (by determining its coefficients) that "besf fits the data. We begin with the simplest and most 
common case: fitting a straight line to data points. 





X 




(a) y = -I- bx 



(b) y^a-^bx-^cx^ 
Figure 6.5.1 



(c) y^a-^bx-^ cxr -I- dbr^ 



Least Squares Fit of a Straiglit Line 

Suppose we want to fit a straight line y = a + to the experimentally determined points 

If the data points were collinear, the line would pass through all n points, and the unknown coefficients a and 
b would satisfy the equations 



■ 

y„ = + bXy^ 



We can write this system in matrix form as 



or more compactly as 



"1 TTl" 








1 X2 




'a' 
















1 x„ 









My = y 



(1) 



where 





>l' 




1 XI 


Y = 






: : 








1 Xn 



(2) 



If the data points are not colHnear, then it is impossible to find coefficients a and b that satisfy system 1 
exactly; that is, the system is inconsistent. In this case we will look for a least squares solution 




4c )|c 

We call a line y = a +b x whose coefficients come from a least squares solution a regression line or a 

least squares straight line fit to the data. To explain this terminology, recall that a least squares solution of 1 
minimizes 

\\y-MY\\ (3) 
If we express the square of 3 in terms of components, we obtain 

||y-Jlfv||2 = Oi-fl-ixi)2 + 0'2-»-A^2)^ + -+Cy«-«-A^«)^ (4) 

If we now let 

then 4 can be written as 

||y-Arv||2 = rf2^rf2^... + rf2 

As illustrated in Figure 6.5.2, the number can be interpreted as the vertical distance between the line 
y = a^bx and the data point (^Xi^ yj) • This distance is a measure of the "error" at the point , y^) 



resulting from the inexact fit of y = ^ 4= i;)^ to the data points, the assumption being that the are known 
exactly and that all the error is in the measurement of the y^. Since 3 and 5 are minimized by the same vector 
V*, the least squares straight line fit minimizes the sum of the squares of the estimated errors cij, hence the 
name least squares straight line fit. 



i 


(^i.yi) 




y 


X 







Figure 6.5.2 dj measures the vertical error in the least squares straight line. 

Normal Equations 

Recall from Theorem 6.4.2 that the least squares solutions of 1 can be obtained by solving the associated 
normal system 

the equations of which are called the normal equations. 

In the exercises it will be shown that the column vectors of M are linearly independent if and only if the n data 
points do not lie on a vertical line in the xy-plane. In this case it follows from Theorem 6.4.4 that the least 
squares solution is unique and is given by 

In summary, we have the following theorem. 

THEOREM 6.5.1 Uniqueness of the Least Squares Solution 

Let (xi,y\), (X2, 72)* (^m* 7«) be a set of two or more data points, not all lying on a vertical 
line, and let 





'1 xi' 






M = 


1 ^2 


and y = 






1 Xyj 




yy7 



Then there is a unique least squares straight line fit 
to the data points. Moreover, 



V = 



is given by the formula 
which expresses the fact that v = v* is the unique solution of the normal equations 



(6) 



(7) 



EXAMPLE 1 Least Squares Straight Line Fit M 

Find the least squares straight line fit to the four points (0, 1), (1,3), (2, 4), and (3,4). (See 
Figure 6.5.3.) 




Figure 6.5.3 



Solution We have 



M = 



1 0 

1 1 

1 2 

1 3 



V* = 



so the desired line is y = 1.5 + x- 



4 6 
6 14 



1 -3 
-3 2 







"l" 






7 -3" 


"1 1 1 r 


3 




"1.5" 


-3 2 _ 


0 1 2 3_ 


4 




_ 1 _ 






4 







EXAMPLE 2 Spring Constant M 



Hooke's law in physics states that the length x of a uniform spring is a linear function of the 
force y applied to it. If we express this relationship y = a + bx^ then the coefficient b is 
called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1 
inches (i.e., ;y = 6. 1 when y = 0). Forces of 2 pounds, 4 pounds, and 6 pounds are then applied 
to the spring, and the corresponding lengths are found to be 7.6 inches, 8.7 inches, and 10.4 
inches (see Figure 6.5.4). Find the spring constant. 



Ft 



1- 



^1 




6 J 


0 


7.6 


2 


8.7 


4 


10,4 


6 



Solution We have 



and 



Figure 6.5.4 



M = 



'1 


6.r 




0 


1 


7.6 




2 


1 


8.7 


. y= 


4 


1 


10.4 




6 



V = 



a 



-8.6 
1.4 



where the numerical values have been rounded to one decimal place. Thus, the estimated value 
of the spring constant is * ~ 1.4 pounds/inch. 



500 

450 
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Altitude h (km) 
Source: NASA 

Historical Note On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and 
transmitted thetemperature T in kelvins (K) versus the altitude h in kilometers (km) until its signal 
was lost at an altitude of about 34 km. Discounting theinitial erratic signal, the data strongly 
suggested a linear relationship, so a least squares straight line fit was used on the linear part of the 
data to obtain the equation 

7=737.5- 8. 125A 

By setting ^ = 0^^ this equation, the surface temperature of Venus was estimated at T'p^ 737. 5K. 
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Least Squares Fit of a Polynomial 



The technique described for fitting a straight line to data points can be generalized to fitting a polynomial of 
specified degree to data points. Let us attempt to fit a polynomial of fixed degree m 

y = a() + aix + ... + a^r^x^ (8) 

to n points 

(^1.71), (^2,72),---, (^n^yn) 
Substituting these n values of x andy into 8 yields the n equations 

y\ = I ^^lA'i +...+ ay^x^ 

y2 = ^0 + ^1^2 +— + ^m^T 

yy, = ai^ + aixy, +...+ a^y^x^ 

or, in matrix form, 

y=Mv (9) 



where 



y= 



yn 



M= 



1 XI x{ 

1 X2 xi 



V = 



<30 



a 



m 



(10) 



1 Xyi Xyi ... 

As before, the solutions of the normal equations 

determine the coefficients of the polynomial, and the vector v minimizes 

||y-Mv|| 

Conditions that guarantee the invertibility of m'^M ^re discussed in the exercises (Exercise 7). If m'^M is 
invertible, then the normal equations have a unique solution v = v*? which is given by 



(11) 



EXAMPLE 3 Fitting a Quadratic Curve to Data M 

According to Newton's second law of motion, a body near the Earth's surface falls vertically 
downward according to the equation 

s = SQ + VQt + ^gt^ (12) 

where 

s = vertical displacement downward relative to some fixed point 
= initial displacement at time i = Q 
= initial velocity at time ^ = 0 
g = acceleration of gravity at the Earth's surface 
from Equation 12 by releasing a weight with unknown initial displacement and velocity and 
measuring the distance it has fallen at certain times relative to a fixed reference point. Suppose 
that a laboratory experiment is performed to evaluate g. Suppose it is found that at times 
^ = .1, .2, .3, .4, and .5 seconds the weight has fallen = -0.18, 0.31, 1.03, 2.48, and 3.73 
feet, respectively, from the reference point. Find an approximate value of g using these data. 

Solution The mathematical problem is to fit a quadratic curve 

s = aQ + ait + a2t'^ (13) 



to the five data points: 

(.1.-0.18), (.2.0.31), (.3,1.03), (.4.2.48), (.5,3.73) 
With the appropriate adjustments in notation, the matrices M and y in 10 are 



1 

1 


1 

. 1 


n 1 




*i 




— u. 1 o 


1 


.2 


.04 




^2 




0.31 


1 


.3 


.09 


. y = 


^3 




1.03 


1 


.4 


.16 




S4 




2.48 


1 


.5 


.25 




S5 




3.73 



Thus, from 11, 



V = 



^0 
<'2 



.-1 



-0.40 
0.35 
16.1 



1 



From 12 and 13, we have a2 = -^rg, so the estimated value of g- is 

g = 2(32 = 2(16.1) = 32. 2 feet / second^ 

If desired, we can also estimate the initial displacement and initial velocity of the weight: 

^0 — '^o ~ ~ ^'^^ ^^^^ 



= ^atj = 0.35 feet /second 



In Figure 6.5.5 we have plotted the five data points and the approximating polynomial. 




.1 .2 .3 .4 .5 
Tunc t (in seconds) 



.6 



Figure 6.5.5 



Concept Review 

• Least squares straight line fit 

• Regression line 

• Least squares polynomial fit 

Skills 



Find the least squares straight line fit to a set of data points. 
Find the least squares polynomial fit to a set of data points. 
Use the techniques of this section to solve applied problems. 



Exercise Set 6.5 

1. Find the least squares straight line fit to the three points (0, 0), (1, 2), and (2,1). 

Answer: 

1 7 

2. Find the least squares straight line fit to the four points (0, 1), (2, 0), (3, 1), and (3, 2). 

3. Find the quadratic polynomial that best fits the four points (2, 0),(3, — 10), (5, — 48), and (6, — 76) . 

Answer: 

4. Find the cubic polynomial that best fits the five points ( — 1, — 14), (0, — 5), (1, — 4), (2, 1), and 
(3, 22). 

5. Show that the matrix M in Equation 2 has linearly independent columns if and only if at least two of the 
numbers xi, X2> are distinct. 

6. Show that the columns of the « x + 1) matrix M in Equation 10 are linearly independent if ^2 > m ^i^d 
at least m \ \ of the numbers 7: 1 , 7:2, . - are distinct. [Hint: A nonzero polynomial of degreem has at 
most m distinct roots.] 

7. Let M be the matrix in Equation 10. Using Exercise 6, show that a sufficient condition for the matrix 
M ' M to be invertible is that ^ ; > and that at least m \ \ oi the numbers .t 1 , X2, - - are distinct. 

8. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in 
thousands) are $4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures on a graph and conjectures 
that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least 
squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of 
the year. 

9. A corporation obtains the following data relating the number of sales representatives on its staff to annual 
sales: 



Number of 

Salt's Reprt'sentalivcs 


5 


10 


15 


:(» 


25 


30 


Annual Sales (millions) 


3.4 


4.3 


5.2 


6.1 


7.2 


8.3 



Explain how you might use least squares methods to estimate the annual sales with 45 representatives, and 
discuss the assumptions that you are making. (You need not perform the actual computations.) 



10. Pathfinder is an experimental, lightweight,remotely piloted,solar-powered aircraft that was used in aseries 
of experiments by NASA to determine the feasibilityof applyingsolar power for long-duration,high- 
ahitude flight. In August 1997 Pathfinder recordedthe data in the accompanying table relating altitude H 
and temperature T. Show that a linear model is reasonable by plotting the data, and then find theleast 
squares line H = Hq + kT of best fit. 

Table Ex-10 



Altitude H 
(thousands of feet) 


15 


20 


25 




.>5 


40 


45 


lemperalure T 

CO 


4.5 


-5.9 


-16.1 


27.6 


-39.8 


-50.2 


-62.9 



11. Find a curve of the form y = a + (b / x) that best fits the data points ( 1 , 7) , (3, 3) , (6, 1 ) by making the 
substitution X = \ f x- Draw the curve and plot the data points in the same coordinate system. 

Answer: 

^ 2\^lx 




True-False Exercises 

In parts (a)-(d) determine whether the statement is true or false, and justify your answer. 

(a) Every set of data points has a unique least squares straight line fit. 
Answer: 

False 

(b) If the data points (x\,y\), (x2, y2)> (^m» yn) ^re not collinear, then 1 is an inconsistent system. 
Answer: 

True 

(c) lfy=a + bx is the least squares line fit to the data points (x\,y\), (x2, 72) » - - (^«» 7«) ? then 
di = [yi ^ (a+ bxi) | is minimal for every 1 < j < «. 



Answer: 



False 

(d) If y = (3 + is the least squares line fit to the data points (xi.yi), (X2, y2) (^m» JJ^m)? then 

S \yi — (<3 + bxj) I is minimal. 

Answer: 

True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



6.6 Function Approximation; Fourier Series 



In this section we will show orthogonal projections can be used to approximate certain types of functions by 
simpler functions that are easier to work with. The ideas explained here have important applications in 
engineering and science. Calculus is required. 



Best Approximations 

All of the problems that we will study in this section will be special cases of the following general problem, 
r n 

APPROXIMATION PROBLEM 

Given a function / that is continuous on an interval [a, b], find the "best possible approximation" to / 
using only functions from a specified subspace W of C[a, b]. 

J 

Here are some examples of such problems: 

(a) Find the best possible approximation to over [0, 1 ] by a polynomial of the form ^q^^^x + a^^' 

(b) Find the best possible approximation to sin::?: over [ — 1, 1 ] by a function of the form 

(c) Find the best possible approximation to x over [0, 2;:] by a function of the form 

In the first example W '\^ the subspace of C[0, 1 ] spanned by 1, and -P", in the second example W 'v^ the 
subspace of C[ — 1, 1 ] spanned by 1, , g^^, and g^-^; and in the third example W '\^ the subspace of 
C[0, 27:] spanned by 1, sin sin 2x, cos t:, and cos In- 



Measurements of Error 

To solve approximation problems of the preceding types, we first need to make the phrase "best 
approximation over [a, by mathematically precise. To do this we will need some way of quantifying the 
error that results when one continuous function is approximated by another over an interval [a, b]. If we 
were to approximate / (^) by g(:^ ) , and if we were concerned only with the error in that approximation at a 
single point xq^ then it would be natural to define the error to be 

error=|/(xo)-g(^o)| 

sometimes called the deviation between / and g at (Figure 6.6.1). However, we are not concerned simply 
with measuring the error at a single point but rather with measuring it over the entire interval [a, b]. The 
problem is that an approximation may have small deviations in one part of the interval and large deviations in 
another. One possible way of accounting for this is to integrate the deviation |/ (^) — g(A") | over the interval 
[a, b] and define the error over the interval to be 



(1) 

Geometrically, 1 is the area between the graphs of / (x) and g(x) over the interval [a, b] (Figure 6.6.2); the 
greater the area, the greater the overall error. 




Figure 6.6.1 The deviation between / and g xq 




Figure 6.6.2 The area between the graphs of f and g over [a, b] measures the error in approximating/ 
by g over [a, b] 



error 



-r 



Although 1 is natural and appealing geometrically, most mathematicians and scientists generally favor the 
following alternative measure of error, called the mean square error: 



■r.' 



mean square error = t [f (x) — g(x) ] ^ dx 



Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage 
that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous 
function on [a, b] that we want to approximate by a function g from a subspace Wof C[a, 6] , and suppose 
that C[a,b] is given the inner product 



• Ja 

It follows that 

l|f 



- g||^ = (f - g. f - g} = y* [/ (^) - g(^) ]^dx = mean square error 

SO minimizing the mean square error is the same as minimizing ||f — gH - Thus the approximation problem 
posed informally at the beginning of this section can be restated more precisely as follows. 



Least Squares Approximation 



r 



LEAST SQUARES APPROXIMATION PROBLEM 

Let f be a function that is continuous on an interval [a, b ] , let C[a, b ] have the inner product 

:)g(x) dx 

and let ^Fbe a finite-dimensional subspace oiC[a,b]. Find a function g in fFthat minimizes 



iif 



Since ||f — g|| and ||f — g|| are minimized by the same function g, this problem is equivalent to looking for a 

function g in Wt\mi is closest to f. But we know from Theorem 6.4.1 that g = projw f is such a function 
(Figure 6.6.3). 

f = function in C\a, b] 
to be appn)ximated 



W 




g = proj ^.f = least squares 
approximation 
subspace of to f from W 

approximating 
functions 

Figure 6.6.3 



Thus, we have the following result. 



THEOREM 6.6.1 

If f is a continuous function on [a,b], and ^Fis a finite-dimensional subspace oiC[a,b], then the 
function g in WihdX minimizes the mean square error 



r 



is g = projf^f , where the orthogonal projection is relative to the inner product 
The function g = projfp- f is called the last squares approximation to f from W. 



Fourier Series 



A function of the form 

T(x) =CQ+c {COS X + C2Cos2x + • • • +Cy2Cos nx d\smx -\- d2sm2x + • • • +dyiSmnx (2) 

is called a trigonometric polynomial; if and dy^ ^re not both zero, then T(x) is said to have order n. For 
example, 

T(x) = 2 + cos ;^ — 3 cos 2?: + 7 sin 4x 

is a trigonometric polynomial of order 4 with 

CO = 2, ci = 1, C2 = - 3, <:3 = 0, C4 = 0, i = 0,d2 = 0, 1^3 = 0, <af4 = 7 

It is evident from 2 that the trigonometric polynomials of order n or less are the various possible linear 
combinations of 



It can be shown that these 2« + 1 functions are linearly independent and thus form a basis for a (2« + 1) 
-dimensional subspace ofC[a,b]. 

Let us now consider the problem of finding the least squares approximation of a continuous function f (x) 
over the interval [0, 2.~] by a trigonometric polynomial of order n or less. As noted above, the least squares 
approximation to f from Wis the orthogonal projection of f on W. To find this orthogonal projection, we must 
find an orthonormal basis go, gl, g2n for after which we can compute the orthogonal projection on W 
from the formula 



(see Theorem 63 Ab). An orthonormal basis for ^Fcan be obtained by applying the Gram-Schmidt process to 
the basis vectors in 3 using the inner product 



1, cos;^, cos27:, cosnx, smx, sm2;^, sin«;^ 



(3) 



projj^f = (f, gojgo + (f, gi}gi + 



' ' ' + (f, g2yi)g2n 



(4) 




This yields the orthonormal basis 




cos X, 




(5) 




= —7= sin nx 



(see Exercise 6). If we introduce the notation 



then on substituting 5 in 4, we obtain 

projfj^f = -^+ [aicosx+ • • • +a„cosnx] + [bisinx+ • • • +bnsinnx] 



where 



<ato = 



_1_ 



f. gl 



= tj {x)^ dx = l tf {X) dx 
= —=l f (x)—^ cos X dx = — I /(x)cosxdx 



1 



\ f \ 1 f 

= — p: / / (x)—^ COS nxdx = — I f {x) cos nx dx 

\ \ fi'^ \ 1 f2?r 

1 ^irJO '"JO 



1 1 I 1 f 1 if 

byi = "7= f , g2M = -7= / / (^)-7= sm?2xdx = - 1 f (x) sin «x 



In short, 



fin 



ak = ^J f{x)coskxdx,b}^ = ^j f{x)svcikxdx 
The numbers a^,a\,...,ayi,b\,...,by^diVQ called the Fourier coefficients of f. 

EXAMPLE 1 Least Squares Approximations A 

Find the least squares approximation of / (tt) = x on [0, 27r] by 

(a) a trigonometric polynomial of order 2 or less; 

(b) a trigonometric polynomial of order n or less. 



Solution 

(a) 



1 1 r 

an = — I f (x) dx = — i x dx^^K 

For k = \,2, integration by parts yields (verify) 



e2n 



- it. 



f (x) COS kx dx 



X COS kx dx = ^ 



(9b) 



_ it 



f (x)smkx dx 



""Jo 



xsmkxdx=-f 
k 



(9c) 



Thus, the least squares approximation to x on [0, 2fr] by a trigonometric polynomial of 
order 2 or less is 

+^{003 t: + taf2C0S 2x + bismx + b2 sin 2x 



or, from (9a), (9b), and (9c), 



X — 2 sin X — sm2x 



(b) The least squares approximation to x on [0, 2ir] by a trigonometric polynomial of order n 
or less is 

x^^+ [a\cosx+ • ■ • +ay^cos?2x] + [b\smx+ ■ • • +i„sin«x] 



or, from (9a), (9b), and (9c), 



smnx \ 
n j 



2 ■ 3 

The graphs ofy = x and some of these approximations are shown in Figure 6.6.4. 

.> = 7r-2(sm,t + ^ + ^) 
. V = IT - 2 sin .t 




5 6 27r 7 
Figure 6.6.4 



X 



It is natural to expect that the mean square error will diminish as the number of terms in the 
least squares approximation 

n 

f(x)^-^+ Yl C^jccos kx + i^sin kx) 

^ k=l 

increases. It can be proved that for functions /in C[0, 2?:] , the mean square error 
approaches zero as « _> | y^; this is denoted by writing 

f(x) = -^+ Y (ajf^cos kx + bj^stn kx) 
^ k=l 



The right side of this equation is called the Fourier series for / over the interval [0, 27r] . 
Such series are of major importance in engineering, science, and mathematics. 




Jean Baptiste Fourier (1768-1830) 

Historical Note Fourier was a French mathematician and physicist who discovered 
the Fourier series and related ideas while working on problems of heat diffusion. This 
discovery was one of the most influential in the history of mathematics; it is the 
cornerstone of many flelds of mathematical research and a basic tool in many branches 
of engineering. Fourier, a political activist during the French revolution, spent time in 
jail for his defense of many victims during the Terror. He later became a favorite of 
Napoleon and was named a baron. 
{Image: The Granger Collection, New York] 



Concept Review 

• Approximation of functions 

• Mean square error 

• Least squares approximation 

• Trigonometric polynomial 

• Fourier coefficients 

• Fourier series 

Skills 

• Find the least squares approximation of a function. 

• Find the mean square error of the least squares approximation of a function. 

• Compute the Fourier series of a function. 



Exercise Set 6.6 



1. Find the least squares approximation of / (t:) = 1 4 :^ over the interval [0, 2ir] by 

(a) a trigonometric polynomial of order 2 or less. 

(b) a trigonometric polynomial of order n or less. 

Answer: 

(a) (1 + 98") — 2 sin t: — sin 2^: 

(b) (i+^)_2|^sinx + ^j^^ + ^i^^+... + ^j^^j 

2. Find the least squares approximation of / (x) =x over the interval [0, 2jr] by 

(a) a trigonometric polynomial of order 3 or less. 

(b) a trigonometric polynomial of order n or less. 

(a) Find the least squares approximation of x over the interval [0, 1 ] by a function of the form a A- be^ 

(b) Find the mean square error of the approximation. 

Answer: 

(a) + 

2 e — 1 

(b) 11 4^ 

12 ^ 2(1-5) 

^* (a) Find the least squares approximation of over the interval [0, 1 ] by a polynomial of the form 
(b) Find the mean square error of the approximation. 



^' (a) Find the least squares approximation of sm kx over the interval [-1, 1] by a polynomial of the form 
aQ -\- a\x I a2?^^' 
(b) Find the mean square error of the approximation. 

Answer: 

(a) 

(b) i.A 

6. Use the Gram-Schmidt process to obtain the orthonormal basis 5 from the basis 3. 

7. Carry out the integrations indicated in Formulas 9a, 9b, and 9c. 

8. Find the Fourier series of / (x) = %^x over the interval [0, 27r] . 



9. Find the Fourier series of f (x) = \,0<x<n and f (x) = 0,f^<x <2% over the interval [0, 2ir] . 
Answer: 



10. What is the Fourier series of sin(37:)? 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) If a function f in C[a, b] is approximated by the function g, then the mean square error is the same as the 
area between the graphs off (?:) and g(7:) over the interval [tat, 6] . 

Answer: 

False 

(b) Given a finite-dimensional subspace W of C[a, b], the function g = proj ^ f minimizes the mean square 
error. 

Answer: 

True 

(c) (1, cosx, sinTT, cos2x, sin27:} is an orthogonal subset of the vector space C[0, 2??] with respect to the 



(d) (1, cost:, sin;:, cos27:, sin27:) is an orthonormal subset of the vector space C[0, 27:] with respect to the 





Answer: 



True 




Answer: 



False 



(e) {1, cost:, smx, cos2t:, sin27:} is a linearly independent subset of C[0, 2w] . 



Answer: 



True 
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Chapter 6 Supplementary Exercises 



1. Let have the Euclidean inner product. 

(a) Find a vector in that is orthogonal to uj = (1, 0, 0, 0) and U4 = (0, 0, 0, 1) and makes equal 
angles with U2 = (0, 1, 0, 0) and U3 = (0, 0, 1, 0). 

(b) Find a vector 'x. = {x\,X2, ^3, ^4) of length 1 that is orthogonal to ui and U4 above and such that the 
cosine of the angle between x and U2 is twice the cosine of the angle between x and U3. 

Answer: 

(a) (0,^,13,0) with fl5t0 

2. Prove: If (u, v J is the Euclidean inner product on and if ^ is an ^ x « matrix, then 
[Hint: Use the fact that ju, v| = u - v = v\.] 

Let M22 have the inner product ( 27, V^ = tr[u'^V^ = trlv'^U^ that was defined in Example 6 of 

Section 6.1 . Describe the orthogonal complement of 

(a) the subspace of all diagonal matrices. 

(b) the subspace of symmetric matrices. 

Answer: 

(a) The subspace of all matrices in M 22 with only zeros on the diagonal. 

(b) The subspace of all skew- symmetric matrices in ^22- 

4. Let = 0 be a system of m equations in n unknowns. Show that 

>1 



x = 



^2 



is a solution of this system if and only if the vector x = X2, - ^n) is orthogonal to every row vector 
of A with respect to the Euclidean inner product onR^. 

5. Use the Cauchy-Schwarz inequality to show that ifa\,a2,---y cty^ are positive real numbers, then 

(^1+^2+ . . . +^„)|J-_^X+ . . . +^)>«2 

6. Show that if x and y are vectors in an inner product space and c is any scalar, then 



Ikx + y||2=c2||x||2 + 2c(x,y) + ||y||2 

7. Let have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of 
the vectors ui = (l, 1, — 1),U2 = ( — 2, — 1,2), andu3 = ( — 1, 0, 1). 

Answer: 

8. Find a weighted Euclidean inner product onR^ such that the vectors 

VI = (1,0,0,...,0) 

V2 = (O, /2,0,...,0) 
V3 = (0,0, /3,...,0) 

v„ = (O, 0, 0,..., 

form an orthonormal set. 

9. Is there a weighted Euclidean inner product on R^ for which the vectors (1,2) and (3, — 1) form an 
orthonormal set? Justify your answer. 

Answer: 

No 

10. If u and v are vectors in an inner product space then u, v, and u — v can be regarded as sides of a 
"triangle" in F(see the accompanying figure). Prove that the law of cosines holds for any such triangle; 
that is, 

||u-v||2=||u||2 + ||v||2-2|H|||v||cos5! 

where is the angle between u and v. 




u 



Figure Ex-10 

• (a) As shown in Figure 3.2.6, the vectors (k, 0, 0), (0, k, 0), and (0, 0, k) form the edges of a cube in R^ 
with diagonal (k, k, k) . Similarly, the vectors 

a, 0, 0,..., 0), (0,^, 0,..., 0),..., (0,0,0,. ..J:) 

can be regarded as edges of a "cube" in with diagonal (i, k, k,...,k). Show that each of the above 
edges makes an angle of 9 with the diagonal, where cos i9 = 1 / 

(b) Calculus required What happens to the angle 0 inpart (a) as the dimension of approaches | qo? 



Answer: 

(b) S approaches ^ 

12. Let u and v be vectors in an inner product space. 

(a) Prove that ||u|| = ||v|| if and only if u + v and q ^ y are orthogonal. 

(b) Give a geometric interpretation of this result in with the Euclidean inner product. 

13. Let u be a vector in an inner product space F, and let {vi, V2 v^i) be an orthonormal basis for V, 

Show that if tk^ is the angle between u and , then 

COS tti + COS Q2 + ■ ■ ■ + COS a„ = 1 

14. Prove: If (u, vj^ and (u, vjj are two inner products on a vector space V, then the quantity 
(u, vj = ^u, vj^ I (u. is also an inner product. 

15. Prove Theorem 6.2.5. 

16. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of ^, then 
the least squares solution of ^ = b is x = 0- 

17. Is there any value of s for which ttj = 1 and 7:2 = 2 is the leastsquares solution of the following linear 
system? 



Explain your reasoning. 
Answer: 

No 

18. Show that if p and q are distinct positive integers, then the functions / (x) = sin px and g{x) = sin qx are 
orthogonal with respect to the inner product 



19. Show that if p and q are positive integers, then the functions / (x) = cos px and g(x) = sin qx are 
orthogonal with respect to the inner product 



XI - X2 = I 
2;ri + 3x2 = 1 
4x1 + 5x2 = s 
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I CHAPTER I 




Diagonalization and 
Quadratic Forms 



CHAPTER CONTENTS 

7.1. Orthogonal Matrices 

7.2. Orthogonal Diagonalization 

7.3. Quadratic Forms 

7.4. Optimization Using Quadratic Forms 

7.5. Hermitian, Unitary, and Normal Matrices 



In Section 5.2 we found conditions that guaranteed the diagonalizability of an ^ x « 
matrix, but we did not consider what class or classes of matrices might actually satisfy 
those conditions. In this chapter we will show that every symmetric matrix is 
diagonalizable. This is an extremely important result because many applications utilize it 
in some essential way. 
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INTRODUCTION 



7.1 Orthogonal Matrices 



In this section we will discuss the class of matrices whose inverses can be obtained by transposition. Such matrices occur in a variety of 
applications and arise as well as transition matrices when one orthonormal basis is changed to another. 



Orthogonal Matrices 

We begin with the following definition. 



DEFINITION 1 

A square matrix A is said to be orthogonal if its transpose is the same as its inverse, that is, if 
or, equivalently, if 



aa^=aU=i (1) 



Recall from Theorem 1.6.3 that if either product in 1 holds, then 
so does the other. Thus, A is orthogonal if either ^ = / or 



EXAMPLE 1 A 3 X 3 Orthogonal Matrix < 



The matrix 



is orthogonal since 



a'^a-- 



A = 



1 1 
7 7 

6 1 
"7 7 

2 6 

7 7 



1 0 0' 
0 1 0 
0 0 1 



EXAMPLE 2 Rotation and Reflection Matrices are Orthogonal A 



Recall from Table Table 5 of Section 4.9 that the standard matrix for the counterclockwise rotation oi through an angle 9 is 

cos 6^ — sm6' 
sin^ cos 0 



A = 



This matrix is orthogonal for all choices of 9 since 



a'^a = 



cos ^ sin^ 
— sin^ cos 0 



cos ^ — sin0 
sin^ cos ^ 





'1 0" 




0 1_ 



We leave it for you to verify that the reflection matrices in Tables Table 1 and Table 2 and the rotation matrices in Table Table 6 of 
Section 4.9 are all orthogonal. 



Observe that for the orthogonal matrices in Example 1 and Example 2, both the row vectors and the column vectors form orthonormal sets with 
respect to the Euclidean inner product. This is a consequence of the following theorem. 



THEOREM 7.1.1 

The following are equivalent for an « x « matrix A. 

(a) A is orthogonal 

(b) The row vectors of A form an orthonormal set in R*^ with the Euclidean inner product. 

(c) The column vectors of A form an orthonormal set in with the Euclidean inner product. 



Proof We will prove the equivalence of {a) and {b) and leave the equivalence of {a) and (c) as an exercise. 

(a) <^ (b) The entry in the ith row and yth column of the matrix product ^ is the dot product of the ith row vector of A and the yth column 
vector of ^ ^ (see Formula 5 of Section 1 .3). But except for a difference in form, the yth column vector of ^ ^ is the yth row vector of ^. Thus, if the 
row vectors of ^4 are r i , r2, . . r„, then the matrix product ^ can be expressed as 



ri Ti ri ■r2 ... ri • r„ 
r2-ri r2T2 ... r2 • r„ 



r« Tl r„ •r2 ... r„ • r„ 

[see Formula 28 of Section 3.2]. Thus, it follows that = / if and only if 

ri •ri=r2-r2=-- = r„-r„=l 

and 

Tj • r^- = 0 when i ^ J 
which are true if and only if {rj , r2, . . r„ } is an orthonormal set inR^. 



WARNING 

Note that an orthogonal matrix is one with orthonormal rows and columns — not simply orthogonal rows and columns. 
The following theorem lists three more fundamental properties of orthogonal matrices. The proofs are all straightforward and are left as exercises. 
THEOREM 7.1.2 

(a) The inverse of an orthogonal matrix is orthogonal. 

(b) A product of orthogonal matrices is orthogonal. 

(c) If A is orthogonal, then det(^) = 1 or det(^) = - 1. 

n 

EXAMPLE 3 det(>4) = ±1 for an Orthogonal Matrix>4 M 

The matrix 



1 1 




^2 {2 



is orthogonal since its row (and column) vectors form orthonormal sets in pj- with the Euclidean inner product. We leave it for you 
to verify that det(^) = 1 and that interchanging the rows produces an orthogonal matrix whose determinant is _1. 



Orthogonal Matrices as Linear Operators 



We observed in Example 2 that the standard matrices for the basic reflection and rotation operators on pp' and p^ are orthogonal. The next theorem 
will explain why this is so. 

m 

THEOREM 7.1.3 

If ^ is an ^ X « matrix, then the following are equivalent. 

(a) A is orthogonal. 

(b) 11^:^11 = 11;^ II for all X in i?". 

(c) Ax • Ay = X • y for all x and y in P^. 



Proof We will prove the sequence of implications (a) => (b) => (c) => (a). 

(a) (b) Assume that A is orthogonal, so that = /. It follows from Formula 26 of Section 3.2 that 

ll^ll = (Ax • Ax)^^^ = (x • A^Ax^^^^ = (x ■ x) = ||x|| 

(b) (c) Assume that ||A3c|| = ||x|| for all x in p^^. From Theorem 3.2.7 we have 

^•^y = i||^ + ^y|p-i||^-^y|p=i||^(x + y)|p-l||^(x-yj|p 

= |l|x + y|p-i||x-y|p=x-y 

(c) (a) Assume that Ax ■ j4y = x • y for all x and y in P^. It follows from Formula 26 of Section 3.2 that 

X • y = x • A'^Ay 

which can be rewritten as x • ^^4 "^Ay — y j = 0 or as 

X- iA'^A-iy = 0 

Since this equation holds for all x in i?", it holds in particular if x , so 

(A'^A-iy- (A'^A-!y = 0 

Thus, it follows from the positivity axiom for inner products that 

(^^^-/)y = 0 

Since this equation is satisfied by every vector y in it must be that A — / is the zero matrix (why?) and hence that a'^A = I- Thus, A is 
orthogonal. 

Theorem 7.1.3 has a useful geometric interpretation when considered from the viewpoint of matrix transformations: If ^ is an orthogonal matrix 
and Tj^.R^ — ► is multiplication by ^4, then we will call Tj\ an orthogonal operator on R^. It follows from parts {a) and {b) of Theorem 7.1.3 
that the orthogonal operators on are precisely those operators that leave the lengths of all vectors unchanged. This explains why, in Example 2, 
we found the standard matrices for the basic reflections and rotations oi P"^ and to be orthogonal. 

Parts {a) and (c) of Theorem 7.1.3 imply that orthogonal 
operators leave the angle between two vectors unchanged. Why? 



Change of Orthonormal Basis 



Orthonormal bases for inner product spaces are convenient because, as the following theorem shows, many familiar formulas hold for such bases. 
We leave the proof as an exercise. 

THEOREM 7.1.4 

If S is an orthonormal basis for an /z-dimensional inner product space V, and if 

(u)^= (2^1,^2, and (v)^= (vi, V2,-.., v„) 

then: 

(«) Hull = }lu^ + ui+ ■ ■ ■ 

rf(u. v) = /(«i-vi)2 + («2-V2)^+ • • ■+{u„-v„f 
(c) (u, v} = ttivi +W2V2+ • • • +a„v„ 



Remark Note that the three parts of Theorem 7.1.4 can be expressed as 

llull = ll(u).yll d{n. v) = d((u)^, (v)^) (u, v} = ((u)^, (v)^) 

where the norm, distance, and inner product on the left sides are relative to the inner product on Kand on the right sides are relative to the 
Euclidean inner product on 

Transitions between orthonormal bases for an inner product space are of special importance in geometry and various applications. The following 
theorem, whose proof is deferred to the end of this section, is concerned with transitions of this type. 

THEOREM 7.1.5 

Let Kbe a finite-dimensional inner product space. If P is the transition matrix from one orthonormal basis for Vio another orthonormal 
basis for V, then P is an orthogonal matrix. 

EXAMPLE 4 Rotation of Axes in 2-Space < 

In many problems a rectangular xy-coordinate system is given, and a new y ' -coordinate system is obtained by rotating the 
xy- system counterclockwise about the origin through an angle 9. When this is done, each point Q in the plane has two sets of 
coordinates — coordinates (^x, y) relative to the xy-system and coordinates ) relative to the t: '7 '-system (Figure 7.1. la). 




Figure 7.1.1 

By introducing unit vectors and U2 along the positive x- andy-axes and unit vectors u'^ and U2 along the positive and y'-axes, 
we can regard this rotation as a change from an old basis B= {ui , U2 } to a new basis ^' = {u'l , U2 1 (Figure lA.lb). Thus, the new 
coordinates ^y') and the old coordinates (x,y) of a point Q will be related by 



where P is the transition from B' to B. To find P we must determine the coordinate matrices of the new basis vectors u| and U2 
relative to the old basis. As indicated in Figure 7.1.1c, the components of u'j in the old basis are cos 9 and sin 9, so 

Similarly, from Figure 7.1.1J, we see that the components of U2 in the old basis are cos(^ -I- tt / 2) = — sin ^ and 
sm(^ 4- ?r / 2) = cos 0, so 

— sin^l 
cos B 



r / -1 f— sin^l 



Thus the transition matrix from 5' to 5 is 



P = 



cos ^ — sin0 
sin^ cos ^ 



] 



Observe that P is an orthogonal matrix, as expected, since B and B' are orthonormal bases. Thus 

cos ^ sin^l 
sin^ cos 

so 2 yields 



or, equivalently. 



7:' 




cosO sin^lr^^l 






_=sm^ cos^jL^'J 






7: cos ^+7 sin^ 


y' 




—xsm9-\-y cos 9 



(3) 



(4) 



(5) 



These are sometimes called the rotation equations for 



EXAMPLES Rotation of Axes in 2-Space < 

Use form 4 of the rotation equations for to find the new coordinates of the point Q(2, 1) if the coordinate axes of a rectangular 
coordinate system are rotated through an angle of = - / 4. 



Solution Since 



the equation in 4 becomes 



sinf = cos^ = 4= 
4 4 ^ 



1 1 

1 J_ 
{2 {2 



Thus, if the old coordinates of a point Q are {^x, y) = (2, — \), then 



1 J_" 






1 


{2 {2 


2 






L JL 






3 


{2 {2 









so the new coordinates of g are |^ , 7 | = | —7=, — —7= 

IV2 y/2 



Remark Observe that the coefficient matrix in 4 is the same as the standard matrix for the linear operator that rotates the vectors of through 
the angle —{} (see margin note for Table 5 of Section 4.9). This is to be expected since rotating the coordinate axes through the angle 9 with the 
vectors oip} kept fixed has the same effect as rotating the vectors mp} through the angle —f) with the axes kept fixed. 



EXAMPLE 6 Application to Rotation of Axes in 3-Space M 



Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counterclockwise (looking down the positive z-axis) 
through an angle 0 (Figure 7.1.2). If we introduce unit vectors ui, U2, and U3 along the positive x-, y-, and z-axes and unit vectors u'^ , 
U2 , and U3 along the positive x'-, y'-, and ^'-axes, we can regard the rotation as a change from the old basis B= {ui , U2, U3 } to the 
new basis ^ = , U3 y in light of Example 4, it should be evident that 



Moreover, since U3 extends 1 unit up the positive ^'-axis. 



cos 0 




— sin^ 


sin^ 


and [u^]^ = 


cos 9 


0 




0 



[«3]s = 




Figure 7.1.2 



It follows that the transition matrix from B' to B is 



and the transition matrix from B to B' is 



cos 9 — sin^ 0 
sin^ cos ^ 0 
0 0 1 

cos ^ sin^ 0 
— sin^ cos ^ 0 

0 0 1 



(verify). Thus, the new coordinates (^'. ^'^ ■^') of a point Q can be computed from its old coordinates (x^ z) by 



cos ^ sin^ 0 
— sin0 cos ^ 0 
0 0 1 



OPTIONAL 

We conclude this section with an optional proof of Theorem 7.1.5. 

Proof of Theorem 7. 1.5 Assume that Vis an ^-dimensional inner product space and that P is the transition matrix from an orthonormal basis 
B' to an orthonormal basis B. We will denote the norm relative to the inner product on Vhy the symbol || || j^- to distinguish it from the norm 
relative to the Euclidean inner product onR^, which we will denote by || || . 



Recall that (u) ^ denotes a coordinate vector expressed in 
comma-delimited form whereas [u] denotes a coordinate vector 
expressed in column form. 

To prove that P is orthogonal, we will use Theorem 7.1.3 and show that \\Px\\ = ||x|| for every vector x in R^. As a first step in this direction, 
recall from Theorem 7.1.4a that for any orthonormal basis for Kthe norm of any vector u in Vis the same as the norm of its coordinate vector with 
respect to the Euclidean inner product, that is 



Nlr=||[u]B-|| = ||[u]g|| 

or 

INIr=ll[u]£'|| = ||/'[u]B.|| (6) 

Now let X be any vector in R^, and let u be the vector in Kwhose coordinate vector with respect to the basis B' is x; that is, [u] = x. Thus, from 
6, 

||u|| = ||x|| = ||f^|| 

which proves that P is orthogonal. 



Concept Review 

• Orthogonal matrix 

• Orthogonal operator 

• Properties of orthogonal matrices. 

• Geometric properties of an orthogonal operator 

• Properties of transition matrices from one orthonormal basis to another. 
Skills 

• Be able to identify an orthogonal matrix. 

• Know the possible values for the determinant of an orthogonal matrix. 

• Find the new coordinates of a point resulting from a rotation of axes. 



Exercise Set 7.1 

^* (a) Show that the matrix 



A = 



is orthogonal in three ways: by calculating a'^A, by using part (b) of Theorem 7.1.1, and by using part (c) of Theorem 7.1.1. 
(b) Find the inverse of the matrix A in part (a). 



4 


0 


3 


5 


5 


9 


4 


12 


25 


5 


25 


12 


3 


16 


25 


5 


25 



Answer: 



(b) 



9 


12 


25 


25 


4 


3 


5 


5 


12 


16 


25 


25 



^* (a) Show that the matrix 



A = 



is orthogonal. 



(b) Let T- R? * p} fee multiplication by the matrix A in part (a). Find ^(x) for the vector x = ( — 2, 3, 5) . Using the Euclidean inner product 
on p}, verify that || T{n) \\ = ||x|| . 

3. Determine which of the following matrices are orthogonal. For those that are orthogonal, find the inverse, 
(a) 



(c) 



(d) 



(e) 



(f) 



1 0" 




0 1 




1 


1 






1 


1 


f2 


f2 


0 1 


1 




1 0 


0 


0 0 


1 



._L _L 

{i {I {i 

J_ J_ 

1 1.1. 

{i {I {i 



0 — 



0 

i 



1 i 

2 2 
i i 
6 6 

i -1 

6 6 

5 i 
"6 6 

0 0 

4« 



0 1 



4 0 



Answer; 
(a) 



ri 01 

.0 ij 



(b) 



(d) 



J L 

.J L 

72 1/2 



1 


0 


1 








1 


2 


1 








1 


1 


1 









(e) 



1 1 

2 2 

1 _5 

2 6 

1 i 

2 6 

1 i 

2 6 



4. Prove that if A is orthogonal, then is orthogonal. 

5. Verify that the reflection matrices in Tables Table 1 and Table 2 of Section 4.9 are orthogonal. 

6. Let a rectangular x'^' -coordinate system be obtained by rotating a rectangular xy-coordinate system counterclockwise through the angle 

(a) Find the x'^y'-coordinates of the point whose xy-coordinates are ( — 2, 6) . 

(b) Find the xy-coordinates of the point whose -coordinates are (5, 2) . 

7. Repeat Exercise 6 with 9 = wf3- 
Answer: 



(a) (-143/3,34/3) 



8. Let a rectangular x y z -coordinate system be obtained by rotating a rectangular xyz-coordinate system counterclockwise about the z-axis 
(looking down the z-axis) through the angle 9 = it / 4- 

(a) Find the x^'^'-coordinates of the point whose xyz-coordinates are ( — 1, 2, 5). 

(b) Find the xyz-coordinates of the point whose TT^'^'-coordinates are (1, 6, — 3). 

9. Repeat Exercise 8 for a rotation of 9 = fl- / 3 counterclockwise about thej^-axis (looking along the positive j^-axis toward the origin). 
Answer: 

,., (4-1^,2, i-i/j) 

10. Repeat Exercise 8 for a rotation of 9 = 3^- / 4 counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 

(a) A rectangular x'^y'z'-coordinate system is obtained by rotating an xy^z-coordinate system counterclockwise about the j^-axis through an 
angle 0 (looking along the positive j^-axis toward the origin). Find a matrix^ such that 



= A 



where (^x, y, z) and (^'» y** are the coordinates of the same point in the xyz- and x'^y -systems, respectively, 
(b) Repeat part (a) for a rotation about the x-axis. 



Answer: 

(a) 

A = 



(b) 



cos 0 0 — sin^ 
0 1 0 

sin 0 0 cos 9 

10 0 
0 COS0 sm0 
0 —sin 0 cos B 



12. A rectangular 7:"y"z" -coordinate system is obtained by first rotating a rectangular xjz-coordinate system 60° counterclockwise about the 
z-axis (looking down the positive z-axis) to obtain an ;c'^'z' -coordinate system, and then rotating the -coordinate system 45° 



counterclockwise about the y'-axis (looking along the positive y'-axis toward the origin). Find a matrix A such that 



= A 



where (x, y, z) and {^"^y", are the xyz- and 7r"^"z"-coordinates of the same point. 

13. What conditions must a and b satisfy for the matrix 

a^b b — a 
a — b b + a 

to be orthogonal? 
Answer: 

14. Prove that a 2 x 2 orthogonal matrix A has only one of two possible forms: 

[ sinfl costfj [ sinfl — cosflj 

where 0 < 0 < 2w- [Hint: Start with a general 2x2 matrix A = (ay), and use the fact that the column vectors form an orthonormal set in /j2 ] 

(a) Use the result in Exercise 14 to prove that multiplication by a 2 x 2 orthogonal matrix is either a reflection or a reflection followed by a 
rotation about the x-axis. 

(b) Prove that multiplication by ^is a rotation if det(-<4) = 1 and that a reflection followed by a rotation if det(j4) = — 1 . 

16. Use the result in Exercise 15 to determine whether multiplication by ^ is a reflection or a reflection followed by a rotation about the x-axis. 
Find the angle of rotation in either case. 



(a) 



A = 



(b) 



A = 



L 4^ 

L L 

-f2 -f2 

2 2 
2 2 



17. Find a, b, and c for which the matrix 

J_ _J_ 

is orthogonal. Are the values of a, b, and c unique? Explain. 
Answer: 

The only possibilities are ~ ^ ~ "" ^ ~ or ^ ~ ^ ~ . 

18. The result in Exercise 15 has an analog for 3 x 3 orthogonal matrices: It can be proved that multipHcation by a 3 x 3 orthogonal matrix ^ is a 
rotation about some axis if det(^) = 1 and is a rotation about some axis followed by a reflection about some coordinate plane if det(j4) = — 1 
. Determine whether multiplication by ^ is a rotation or a rotation followed by a reflection. 



(a) 



A = 



1 2 
7 7 

§. 1 

"7 7 

2 6 
7 7 



A = 



(b) 2 3 6 

7 7 7 
3 _6 2 
7 7 7 
^ 2 _3 
7 7 7 

19. Use the fact stated in Exercise 18 and part (b) of Theorem 7.1.2 to show that a composition of rotations can always be accomplished by a single 
rotation about some appropriate axis. 

20. Prove the equivalence of statements (a) and (c) in Theorem 7.1.1. 

21. A linear operator on p/ is called rigid if it does not change the lengths of vectors, and it is called angle preserving if it does not change the 
angle between nonzero vectors. 

(a) Name two different types of linear operators that are rigid. 

(b) Name two different types of linear operators that are angle preserving. 

(c) Are there any linear operators on p^ that are rigid and not angle preserving? Angle preserving and not rigid? Justify your answer. 
Answer: 

(a) Rotations about the origin, reflections about any line through the origin, and any combination of these 

(b) Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 

(c) No; dilations and contractions 

True-False Exercises 

In parts (a)-(h) determine whether the statement is true or false, and justify your answer, 
(a) 



The matrix 

Answer: 

False 

The matrix 
Answer: 



1 0 
0 1 
0 0 



is orthogonal. 



1 -2 

2 1 



is orthogonal. 



False 

(c) An mx^ matrix A is orthogonal if ^ = /. 
Answer: 

False 

(d) A square matrix whose columns form an orthogonal set is orthogonal. 
Answer: 

False 

(e) Every orthogonal matrix is invertible. 
Answer: 

True 

(f) If A is an orthogonal matrix, then is orthogonal and (det = 1. 
Answer: 



True 

(g) Every eigenvalue of an orthogonal matrix has absolute value 1 . 



Answer: 

True 

(h) If ^ is a square matrix and || j4u|| = 1 for all unit vectors u, then A is orthogonal. 
Answer: 
True 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



7.2 Orthogonal Diagonalization 

In this section we will be concerned with the problem of diagonalizing a symmetric matrix ^. As we will see, this problem is 
closely related to that of finding an orthonormal basis for that consists of eigenvectors of ^. Problems of this type are 
important because many of the matrices that arise in applications are symmetric. 



The Orthogonal Diagonalization Problem 

In Definition 1 of Section 5.2 we defined two square matrices, A and B, to be similar if there is an invertible matrix P such 
that P~^AP = B- III this section we will be concerned with the special case in which it is possible to find an orthogonal 
matrix P for which this relationship holds. 

We begin with the following definition. 

r 

DEFINITION 1 

If A and B are square matrices, then we say that A and B are orthogonally similar if there is an orthogonal matrix P 
such that p '^AP = B- 



If A is orthogonally similar to some diagonal matrix, say 

P'^AP = D 

then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A. 

Our first goal in this section is to determine what conditions a matrix must satisfy to be orthogonally diagonalizable. As a 
first step, observe that there is no hope of orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, 
suppose that 

P'^AP = D (1) 

where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side of 1 by P, the right side by p ^, and then 
using the fact that pp^ = p^p = J,wq can rewrite this equation as 

A = PDP^ (2) 

Now transposing both sides of this equation and using the fact that a diagonal matrix is the same as its transpose we obtain 

A^= {PDP'^^ = (P^^^D^P^ = PDP^ = A 

so A must be symmetric. 



Conditions for Orthogonal Diagonalizability 

The following theorem shows that every symmetric matrix is, in fact, orthogonally diagonalizable. In this theorem, and for 
the remainder of this section, orthogonal will mean orthogonal with respect to the Euclidean inner product onR^. 



THEOREM 7.2.1 



If v4 is an ^ X « matrix, then the following are equivalent. 

(a) ^ is orthogonally diagonalizable. 

(b) A has an orthonormal set of n eigenvectors. 

(c) A is symmetric. 



Proof 

(a) => (b) Since A is orthogonally diagonalizable, there is an orthogonal matrix P such that P~^AP is diagonal. As shown in 
the proof of Theorem 5.2.1, the n column vectors of P are eigenvectors of ^. Since P is orthogonal, these column vectors are 
orthonormal, so A has n orthonormal eigenvectors. 

(b) => (a) Assume that ^ has an orthonormal set of n eigenvectors (pi, p2, - - -, Pyj) • As shown in the proof of Theorem 5.2.1, 
the matrix P with these eigenvectors as columns diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal 
and thus orthogonally diagonalizes A. 

(a) (c) In the proof that {a) => (b) we showed that an orthogonally diagonalizable « x « matrix A is orthogonally 
diagonalized by an ^ x « matrix P whose columns form an orthonormal set of eigenvectors of ^. Let D be the diagonal 
matrix 

from which it follows that 

A = PDP'^ 

Thus, 

A^= {PDP'^^ = PD'^p'^ = PDP'^ = A 

which shows that A is symmetric. 

(c) => (a) The proof of this part is beyond the scope of this text and will be omitted. 



Properties of Symmetric Matrices 

Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, but before we can do so, we need 
the following critical theorem about eigenvalues and eigenvectors of symmetric matrices. 

THEOREM 7.2.2 

If v4 is a symmetric matrix, then: 

(a) The eigenvalues of^ are all real numbers. 

(b) Eigenvectors from different eigenspaces are orthogonal. 

o a 



Part (a), which requires results about complex vector spaces, will be discussed in Section 7.5. 



Proof (b) Let vi and V2 be eigenvectors corresponding to distinct eigenvalues X\ and A2 of the matrix A. We want to show 
that VI • V2 = 0. Our proof of this involves the trick of starting with the expression • V2. It follows from Formula 26 of 
Section 3.2 and the symmetry of ^ that 



T 

Av\ • V2 = VI • ^ V2 = VI • Ay2 (3) 

But vi is an eigenvector of^ corresponding to \i, and V2 is an eigenvector of ^ corresponding to A2, so 3 yields the 
relationship 

Ajvi • V2 = VI • A2V2 

which can be rewritten as 

(Ai-A2)(vi •V2)= (4) 

But Ai — A2 0, since A^ and \2 were assumed distinct. Thus, it follows from 4 that vi • V2 = 0. 

Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a symmetric matrix. 

r n 

Orthogonally Diagonalizing an n x n Symmetric Matrix 

Step 1 Find a basis for each eigenspace of ^. 

Step 2 Apply the Gram-Schmidt process to each of these bases to obtain an orthonormal basis for each eigenspace. 
Step 3 Form the matrix P whose columns are the vectors constructed in Step 2. This matrix will orthogonally 
diagonalize A, and the eigenvalues on the diagonal of ^ = P^AP ^^^^ same order as their corresponding 

eigenvectors in P. 



Remark The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvectors from different 
eigenspaces are orthogonal, and applying the Gram- Schmidt process ensures that the eigenvectors within the same 
eigenspace are orthonormal. It follows that the entire set of eigenvectors obtained by this procedure will be orthonormal. 



EXAMPLE 1 Orthogonally Diagonalizing a Symmetric Matrix A 

Find an orthogonal matrix P that diagonalizes 



A = 



4 2 2 
2 4 2 
2 2 4 



Solution We leave it for you to verify that the characteristic equation of A is 

"A -4 -2 -2 

det(A/ -A) = det 



_2 A-4 -2 
_2 -2 A-4 



= (A-2)^(A-8) = 0 



Thus, the distinct eigenvalues of v4 are A = 2 A = 8- By the method used in Example 7 of Section 5. 1, it 
can be shown that 





"-1" 




-1" 


«1 = 


1 


and U2 = 


0 




0 




1 



(5) 



fomi a basis for the eigenspace corresponding to A = 2- Applying the Gram-Schmidt process to {ui, U2} 
yields the following orthonormal eigenvectors (verify): 

1 

J_ 
^, 

and V2 ■■ 



VI = 



1 

0 



fe 
1 

fe 

_2_ 



The eigenspace corresponding to ,\ = 8 has 



U3 = 



as a basis. Applying the Gram-Schmidt process to {U3} (i.e., normalizing U3) yields 

1 

1 



V3 = 



1 

f3 



Finally, using v^, V2, and V3 as column vectors, we obtain 

1 1 



P = 



1 
1 



_L L 

2 1 



0 



which orthogonally diagonalizes ^. As a check, we leave it for you to confirm that 

1 L J_ 

1 L J_ 

/2 f ][3 



P'^AP = 



1 


1 


0 




f 


1 


1 


2 


fe 






1 


1 


1 


1^ 


f 


f 



4 2 2 
2 4 2 
2 2 4 



0 ^ ^ 



/6 /3 



2 0 0 
0 2 0 
0 0 8 



(6) 



Spectral Decomposition 



If ^ is a symmetric matrix that is orthogonally diagonalized by 



P= [^1 U2 ... u„] 

and if Aj , A2, . . A„ are the eigenvalues of A corresponding to the unit eigenvectors ui , 112, . . u„, then we know that 
D = P'^ AP' where D is a diagonal matrix with the eigenvalues in the diagonal positions. It follows from this that the matrix 
A can be expressed as 









Ai 


0 . 


.. 0 


T 
^1 


A = PDP'^ = 


ui U2 




0 


A2 . 


.. 0 


T 








0 


0 . 


• K 


T 
u„ 







t' 












T 


Aiui A2U2 










T 









Multiplying out, we obtain the formula 

i4 = Aiuiuf + A2U2U2 + - . - + A„u„uJ (7) 



which is called a spectral decomposition of A. 

Note that in each term of the spectral decomposition of A has the form Auu^? where u is a unit eigenvector of A in column 
form, and A. is an eigenvalue of A corresponding to u. Since u has size >i ,> 1 , it follows that the product has size ^y^^.lt 
can be proved (though we will not do it) that yyy^ is the standard matrix for the orthogonal projection of on the subspace 
spanned by the vector u. Accepting this to be so, the spectral decomposition of A tells that the image of a vector x under 
multiplication by a symmetric matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional 
subspaces) determined by the eigenvectors of ^4, then scaling those projections by the eigenvalues, and then adding the scaled 
projections. Here is an example. 

EXAMPLE 2 A Geometric Interpretation of a Spectral Decomposition ^ 



The matrix 



A = 



1 2 

2 -2 



has eigenvalues Aj = — 3 and .\2 = 2 with corresponding eigenvectors 



XI = 



(verify). Normalizing these basis vectors yields 



1 

-2 



1 

2 



and X2 = 



and U2 = ..^ ^ n = 



I|X2|| 



2 



SO a spectral decomposition of A is 



1 2 

2 -2 



: Aluiuf + A2U2U2 = ( - 3) 



= (-3) 



_2_ 



1 



2 



(2) 



2 
1 



2 1 



(8) 



1 _2 

5 5 

2 4 
■5 5 





"4 2" 


+ (2) 


5 5 
2 1 




5 5 



where, as noted above, the 2 x 2 matrices on the right side of 8 are the standard matrices for the orthogonal 
projections onto the eigenspaces corresponding to = — 3 and A2 = 2, respectively. 

Now let us see what this spectral decomposition tells us about the image of the vector x = (1, 1) under 
multiplication hy A. Writing x in column form, it follows that 



i4x = 



"1 2" 


r 




'3 


2 -2_ 


_i_ 




0 



(9) 



and from 8 that 



Ax.= 



"l 2" 


"r 


.2 -2. 





= (-3) 







"4 2" 


rr 
[i 


+ (2) 


5 5 
2 1 
5 5 





1 " 




"6" 


(-3) 


5 

2 


+ (2) 


5 

3 




5 




5 



(10) 



3' 




" 12 ' 


5 




5 








6 




6 


5 




5 



Formulas 9 and 10 provide two different ways of viewing the image of the vector (1,1 ) under multiplication by 
A\ Formula 9 tells us directly that the image of this vector is (3, 0), whereas Formula 10 tells us that this image 
can also be obtained by projecting (1,1) onto the eigenspaces corresponding to = — 3 and A2 = 2 to obtain 

the vectors | ~ y j j' ^^^^ scaling by the eigenvalues to obtain ~ 5^) ("^^ 5^)' ^^^^ 

adding these vectors (see Figure 7.2.1). 




Ax = (3,0) 



Figure 7.2.1 



The Nondiagonalizable Case 



If v4 is an ^ X « matrix that is not orthogonally diagonalizable, it may still be possible to achieve considerable simplification 
in the form of p "^AP by choosing the orthogonal matrix P appropriately. We will consider two theorems (without proof) that 
illustrate this. The first, due to the German mathematician Isaai Schur, states that every square matrix A is orthogonally 
similar to an upper triangular matrix that has the eigenvalues of A on the main diagonal. 



THEOREM 7.2.3 Schur's Theorem 



If v4 is an « X « matrix with real entries and real eigenvalues, then there is an orthogonal matrix P such that p '^AP is 
an upper triangular matrix of the form 



in which Ai, A2, A„ are the eigenvalues of the matrix A repeated according to multiplicity. 





X 


X 


■ ■ X 


0 


A2 


X 


• • X 


0 


0 


A3 • 


• • X 


0 


0 


0 • 





(11) 




Issai Schur (1875-1941) 

Historical Note The life of the German mathematician Issai Schur is a sad reminder of the effect that Nazi policies 
had on Jewish intellectuals during the 1930s. Schur was a brilliant mathematician and a popular lecturer who 
attracted many students and researchers to the University of Berlin, where he worked and taught. His lectures 
sometimes attracted so many students that opera glasses were needed to see him from the back row. Schur's life 
became increasingly difficult under Nazi rule, and in April of 1933 he was forced to "retire" from the university 
under a law that prohibited non- Aryans from holding "civil service" positions. There was an outcry from many of his 
students and colleagues who respected and liked him, but it did not stave off his complete dismissal in 1935. Schur, 
who thought of himself as a loyal German never understood the persecution and humiliation he received at Nazi 
hands. He left Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had to sell his 
beloved mathematics books and lived in poverty until his death in 1941. 
{Image: Courtesy Electronic Publishing Services, Inc., New York City] 



It is common to denote the upper triangular matrix in 1 1 by (for Schur), in which case that equation can be rewritten as 



A = PSP^ 



(12) 



which is called a Schur decomposition of A. 

The next theorem, due to the German mathematician and engineer Karl Hessenberg (1904-1959), states that every square 
matrix with real entries is orthogonally similar to a matrix in which each entry below the first subdiagonal is zero (Figure 
1.2.2). Such a matrix is said to be in upper Hessenberg form. 




First subdiagonal 



Figure 7.2.2 



THEOREM 7.2.4 Hessenberg's Theorem 



If v4 is an « X « matrix, then there is an orthogonal matrix P such that p '^AP is a matrix of the form 



P^AP = 



X X 

X X 

0 X 

0 0 

0 0 



XXX 
XXX 



XXX 

0 X X 



(13) 



Note that unlike those in 11, the diagonal entries in 13 
are usually not the eigenvalues of ^. 



It is common to denote the upper Hessenberg matrix in 13 by //(for Hessenberg), in which case that equation can be 
rewritten as 

A^PHP'^ (14) 
which is called an upper Hessenberg decomposition of A. 

Remark In many numerical algorithms the initial matrix is first converted to upper Hessenberg form to reduce the amount 
of computation in subsequent parts of the algorithm. Many computer packages have built-in commands for finding Schur and 
Hessenberg decompositions. 



Concept Review 

• Orthogonally similar matrices 



• Orthogonally diagonalizable matrix 

• Spectral decomposition (or eigenvalue decomposition) 

• Schur decomposition 

• Subdiagonal 

• Upper Hessenburg form 

• Upper Hessenburg decomposition 

Skills 

• Be able to recognize an orthogonally diagonalizable matrix. 

• Know that eigenvalues of symmetric matrices are real numbers. 

• Know that for a symmetric matrix eigenvectors from different eigenspaces are orthogonal. 

• Be able to orthogonally diagonalize a symmetric matrix. 

• Be able to fmd the spectral decomposition of a symmetric matrix. 

• Know the statement of Schur's Theorem. 

• Know the statement of Hessenburg's Theorem. 



Exercise Set 7.2 

1. Find the characteristic equation of the given symmetric matrix, and then by inspection determine the dimensions of the 
eigenspaces. 



(a) 


"1 2" 












2 4_ 










(b) 


1 


-4 




2" 






-4 


1 




-2 






2 


-2 




-2_ 




(c) 


"1 1 


r 










1 1 


1 










1 1 


1 








(d) 


"4 2 


2" 










2 4 


2 










2 2 


4 








(e) 


'4 4 


0 


0" 








4 4 


0 


0 








0 0 


0 


0 








0 0 


0 


0 






(f) 


2 


-1 




0 


0 




-1 


2 




0 


0 




0 


0 




2 


-1 




0 


0 




-1 


2 



Answer: 



(a) A"^ — 5A = 0: A = 0: one-dimensional; A = 5; one-dimensional 

(b) \^ — 27A — 54 = 0: A = 6: one-dimensional; A = — 3: two-dimensional 



(c) — 3X^ = 0: A= 3: one-dimensional; A = 0: two-dimensional 

(d) — 12A^ + 36A — 32 = 0; A = 2: two-dimensional; A = 8: one-dimensional 

(e) X^ — SX^ = 0: A = 0: three-dimensional; A = 8: one-dimensional 

(f) X^ — 8A"^ + 22A^ — 24A +9 = 0; A = 1 : two-dimensional; A = 3: two-dimensional 

In Exercises 2-9, find a matrix P that orthogonally diagonalizes A, and determine P^^AP- 
2. A 



3. 



A = 



3 1 
1 3^ 

6 2/3 

2|/3 7 



Answer: 



__2_ il 



A = 



3 0 
0 10 



-2 


0 


-36 


0 


-3 


0 


-36 


0 


-23 



Answer: 





4 


0 


3" 








5 




5 






p = 


0 


1 


0 






3 


0 


4 








5 




5 






6. 


'1 1 


0" 








A = 


1 1 


0 










0 0 


0_ 








7. 


2 


-1 




■r 






-1 


2 




■1 






-1 


-1 




2 




Answer: 












' 1 




1 




1 












-fs 




1 




1 




1 


P = 














1 




0 




2 















25 0 0 
0-3 0 
0 0 -50 



; P-^AP: 



0 0 0 
0 3 0 
0 0 3 



8. 



9. 



A = 



Answer: 



3 1 


0 


0 






1 3 


0 


0 






0 0 


0 


0 






0 0 


0 


0 






-7 


24 




0 


0 


24 


7 




0 


0 


0 


0 




7 


24 


0 


0 


24 


7 



p= 



A 
'5 
3 
5 

0 
0 



I » 

4 

5 



4 0 



-25 


0 


0 


0 


0 


25 


0 


0 


0 


0 


-25 


0 


0 


0 


0 


25 



10. Assuming that gc 0? find a matrix that orthogonally diagonalizes 

~a b 



11. Prove that if A is any mY.n matrix, then has an orthonormal set of n eigenvectors. 

(a) Show that if v is any » x 1 matrix and / is the » x » identity matrix, then / _ is orthogonally diagonalizable. 

(b) Find a matrix P that orthogonally diagonalizes / . vv^ if 

"1" 



v = 



13. Use the result in Exercise 19 of Section 5.1 to prove Theorem 1 22a for 2 x 2 symmetric matrices. 

14. Does there exist a 3 x 3 symmetric matrix with eigenvalues Aj = — 1, A2 = 3, A3 = 7 and corresponding eigenvectors 



0 




1 




0 


1 




0 




1 


-1 




0 




1 



If so, find such a matrix; if not, explain why not. 

15. Is the converse of Theorem 122b true? Explain. 

Answer: 

No 

16. Find the spectral decomposition of each matrix, 
(a) 



(b) 
(c) 



[J 1] 

-3 1 

1 -3 

2 2 



(d) 



-2 0 -36 

0-3 0 
-36 0 -23 



17. Show that if ^ is a symmetric orthogonal matrix, then 1 and _1 are the only possible eigenvalues. 

(a) Find a 3 v 3 symmetric matrix whose eigenvalues are = — 1, A2 = 3, A3 = 7 and for which the corresponding 
eigenvectors are vi = (0, 1, — 1), V2 = (1, 0, 0), V3 = (0, 1, 1). 

(b) Is there a 3 x 3 symmetric matrix with eigenvalues X\= — 1, A2 = 3, A3 = 7 and corresponding eigenvectors 
VI = (0, 1, — 1), V2 = (1, 0, 0), V3 = (1, 1, 1)? Explain your reasoning. 

19. Let^ be a diagonalizable matrix with the property that eigenvectors from distinct eigenvalues are orthogonal. Must^ be 
symmetric? Explain you reasoning. 

Answer: 

Yes 

20. Prove: If {uj, U2, u„) is an orthonormal basis for R^, and if A can be expressed as 

A = cuiuj + C2^2'^2 + ••• + 

then A is symmetric and has eigenvalues c: 1 , C2, ...,Cy2^ 
21An this exercise we will establish that a matrix A is orthogonally diagonalizable if and only if it is symmetric. We have 
shown that an orthogonally diagonalizable matrix is symmetric. The harder part is to prove that a symmetric matrix ^4 is 
orthogonally diagonalizable. We will proceed in two steps: first we will show that^ is diagonalizable, and then we will 
build on that result to show that A is orthogonally diagonalizable. 

(a) Assume that ^ is a symmetric ^xn matrix. One way to prove that A is diagonalizable is to show that for each 
eigenvalue Aq the geometric multiplicity is equal to the algebraic multiplicity. For this purpose, assume that the 
geometric multiplicity of Ag is A:, let 5o = {ui, U2, Uj^;) be an orthonormal basis for the eigenspace corresponding 
to A|], extend this to an orthonormal basis B= {ui , U2, . . u„ ) for R''\ and let P be the matrix having the vectors of 
B as columns. As shown in Exercise 34(b) of Section 5.2, the product can be written as 



AP = P 



0 Y 



Use the fact that B is an orthonormal basis to prove that X = 0[^ zero matrix of size n x (n ^ k)] . 
(b) It follows from part (a) and Exercise 34(c) of Section 5.2 that ^ has the same characteristic polynomial as 



c= 



0 



Use this fact and Exercise 34(d) of Section 5.2 to prove that the algebraic multiplicity of Ag is the same as the 
geometric multiplicity of Ag. This establishes that^ is diagonalizable. 

(c) Use Theorem 1.2.2(b) and the fact that^ is diagonalizable to prove that^ is orthogonally diagonalizable. 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer, 
(a) If v4 is a square matrix, then ^and A are orthogonally diagonalizable. 
Answer: 



True 



(b) If VI and V2 are eigenvectors from distinct eigenspaces of a symmetric matrix, then ||y| ^yjjj^ — ||vi||^ + I|v2ll^' 
Answer: 



True 

(c) Every orthogonal matrix is orthogonally diagonalizable. 
Answer: 

False 

(d) If ^ is both invertible and orthogonally diagonalizable, then jl"^ is orthogonally diagonalizable. 
Answer: 

True 

(e) Every eigenvalue of an orthogonal matrix has absolute value 1 . 
Answer: 

True 

(f) If A is an yj > « orthogonally diagonalizable matrix, then there exists an orthonormal basis for /J" consisting of 
eigenvectors of ^. 

Answer: 

False 

(g) If A is orthogonally diagonalizable, then A has real eigenvalues. 
Answer: 

True 
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7.3 Quadratic Forms 



In this section we will use matrix methods to study real-valued functions of several variables in which each term is either the 
square of a variable or the product of two variables. Such functions arise in a variety of applications, including geometry, 
vibrations of mechanical systems, statistics, and electrical engineering. 



Definition of a Quadratic Form 

Expressions of the form 

occurred in our study of linear equations and linear systems. lfa\, a^, ---.CLn treated as fixed constants, then this expression 
is a real-valued function of the n variables x i , 7:2, - and is called a linear form on BP • All variables in a linear form occur 
to the first power and there are no products of variables. Here we will be concerned with quadratic forms on BP, which are 
functions of the form 

a\x\ + <^2^ + ... H- + \^ possible terms tat/^TTj-TTy in which ^ Xj^ 

The terms of the form ^k'^i^j are called cross product terms. It is common to combine the cross product terms involving ^i^; 
with those involving to avoid duplication. Thus, a general quadratic form on Bl^ would typically be expressed as 



a\x\ + a2X^ + 2ayi\X2 



(1) 



and a general quadratic form on B? as 



111 
a\x^ '¥a2X2 +1237:3 + 2a^\X2 + 2a^x\xi 2a(pi2^Z 



(2) 



If, as usual, we do not distinguish between the number a and the 1x1 matrix \a\ and if we let x be the column vector of 
variables, then 1 and 2 can be expressed in matrix form as 



[^1 ^2] 


-ax 












a2_ 


/2_ 






















[^1 ^2 






^2 




^2 








^5 




a3 


^3 





(verify). Note that the matrix^ in these formulas is symmetric, that its diagonal entries are the coefficients of the squared terms, 
and its off-diagonal entries are half the coefficients of the cross product terms. In general, if ^ is a symmetric ^xn matrix and x 
is an « X 1 column vector of variables, then we call the function 



the quadratic form associated with A. When convenient, 3 can be expressed in dot product notation as 



(3) 



(4) 



In the case where ^ is a diagonal matrix, the quadratic form x ^ has no cross product terms; for example, if A has diagonal 
entries Ai, A2, A„, then 









0 • 


• • 0 " 








x^Ax. — [xi X2 • 




0 


A2 • 


• • 0 

: 


X2 


= Xixj X2X2 ' ' 








0 


0 • 











EXAMPLE 1 Expressing Quadratic Forms in Matrix Notation M 



In each part, express the quadratic form in the matrix notation x^Asc^ where A is symmetric. 

(a) 2x^ + 6xy-5y^ 

(b) xj-hlxj - 37:3+4x1x2 - 2x1X3 + 8x2x2 

Solution The diagonal entries of A are the coefficients of the squared terms, and the off-diagonal entries are half 
the coefficients of the cross product terms, so 

^"2 3" 



2x'^ + 6xy - 5y^ = 



3 -5 



Xj +7x2 -3x3 +4x1X2-2x1X3 + 8x2X3= ^2 ^3] 



1 2 -1 

2 7 4 
-1 4 -3 



^1 

^3 



Change of Variable in a Quadratic Form 

There are three important kinds of problems that occur in applications of quadratic forms: 

r 

Problem 1 If x^Ax: is a quadratic form onj^^orp^^, what kind of curve or surface is represented by the equation 
Problem 2 If x'^^ is a quadratic form onR^, what conditions must A satisfy for to have positive values for 

Problem 3 If x^^ is a quadratic form on what are its maximum and minimum values if x is constrained to satisfy 
11x11 = 1? 



We will consider the first two problems in this section and the third problem in the next section. 

Many of the techniques for solving these problems are based on simplifying the quadratic form x^.4x by making a substitution 

^ = Py (5) 

that expresses the variables x \ , X2, - - x„ in terms of new variables y ^ ^ . . yyy If ^ is invertible, then we call 5 a change of 
variable, and if P is orthogonal, then we call 5 an orthogonal change of variable. 

If we make the change of variable x = Py in the quadratic form x^Ax:^ then we obtain 

x^^ = (Py) ^A(Py^ = y^P^AFy = y^(p^APy (6) 

Since the matrix 5 = p'^AP is symmetric (verify), the effect of the change of variable is to produce a new quadratic form y'^By 

in the variables y\, y2, y^'^^ particular, if we choose P to orthogonally diagonalize A, then the new quadratic form will be 
T 

y Dy, where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is. 







Ai 


0 • 


■ • 0 


>1 


x'^Ax. = y'^Dy = [yi y2 ■ 


■ ■ 7«] 


0 


A2 • 


■ • 0 


y2 






0 


0 • 


■ ■ 


7m 



Thus, we have the following result, called the principal axes theorem. 

•J 

THEOREM 7.3.1 The Principal Axes Theorem 

If ^ is a symmetric n x.n matrix, then there is an orthogonal change of variable that transforms the quadratic form 

into a quadratic form y Dy with no cross product terms. Specifically, if P orthogonally diagonalizes A, then making the 

change of variable x = Py in the quadratic form yields the quadratic form 

in which Ai, A2, A„ are the eigenvalues of^ corresponding to the eigenvectors that form the successive columns of 
P. 



EXAMPLE 2 An Illustration of the Principal Axes Theorem A 

Find an orthogonal change of variable that eliminates the cross product terms in the quadratic form 
Q = x^—X2— ^^\^2 ^ ^^2^3' express Q in terms of the new variables. 



Solution The quadratic form can be expressed in matrix notation as 



Q = x^Ax = 



XI X2 X3 



1 


-2 


0 


"^1 


-2 


0 


2 




0 


2 


-1 


^3 



The characteristic equation of the matrix A is 

A-1 2 0 
2 A -2 
0 -2 A+1 



= A''-9A = A(A + 3)(A-3) = 0 



so the eigenvalues are A = 0. —3, 3. We leave it for you to show that orthonormal bases for the three eigenspaces 
are 



A = 0: 



Thus, a substitution x = Py that eliminates the cross product terms is 



'2' 




r 




2" 


3 




3 




3 


1 


, A= -3: 


2 


, A=3: 


2 


3 




3 




3 


2 




2 




1 


3 




3 




3 



^1 

^3 



71 

73 



This produces the new quadratic form 





'0 


0 


0" 


71" 


[71 72 73] 


0 


-3 


0 


72 




0 


0 


3 


73 



-372'+3y| 



in which there are no cross product terms. 



Remark If ^4 is a symmetric n xn matrix, then the quadratic form x is a real- valued function whose range is the set of all 
possible values for x^A: as x varies over It can be shown that an orthogonal change of variable x = Py does not alter the 
range of a quadratic form; that is, the set of all values for as x varies over ^ " is the same as the set of all values for 
y^ip'^AP^Y as y varies over 



Quadratic Forms in Geometry 

Recall that a conic section or conic is a curve that results by cutting a double-napped cone with a plane (Figure 7.3.1). The most 
important conic sections are ellipses, hyperbolas, and parabolas, which result when the cutting plane does not pass through the 
vertex. Circles are special cases of ellipses that result when the cutting plane is perpendicular to the axis of symmetry of the 
cone. If the cutting plane passes through the vertex, then the resulting intersection is called a degenerate conic. The possibilities 
are a point, a pair of intersecting lines, or a single line. 




Circle 




Ellipse 




Parabola 




Hyperbola 



Figure 7.3.1 




A central conic 
rotated out of 
standard position 



Figure 7.3.2 

Quadratic forms mp} arise naturally in the study of conic sections. For example, it is shown in analytic geometry that an 



equation of the fonn 



2 2 

ax + 2bxy ~\- cy + d/x + + / = 0 



(7) 



in which a, b, and c are not all zero, represents a conic section. If |^ = ^ = 0 in 7, then there are no linear terms, so the equation 
becomes 



(8) 



and is said to represent a central conic. These include circles, ellipses, and hyperbolas, but not parabolas. Furthermore, if ^? = 0 
in 8, then there is no cross product term (i.e., term involving xy), and the equation 



ax^ + + / = 0 



(9) 



is said to represent a central conic in standard position. The most important conies of this type are shown in Table 1. 

Table 1 




(a>P >0) 



i3>a>0) 



(a>0,i8>0) 





V 






1 \ 


' 1 


-1/ 


\ ! Of 




\ 1 














// 






If we take the constant /in Equations 8 and 9 to the right side and let A: = — / , then we can rewrite these equations in matrix 
form as 



a h 
h c 



■ k and y] 



a 


0" 


'x ' 


0 


c 


y 



(10) 



The first of these corresponds to Equation 8 in which there is a cross product term 2bxy, and the second corresponds to Equation 
9 in which there is no cross product term. Geometrically, the existence of a cross product term signals that the graph of the 
quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional analogs of the equations in 10 are 





a 


d 








a 


0 


0" 


~x ~ 


[X y z] 


d 


b 


/ 


y 


= k and y ^] 


0 


b 


0 


y 




e 


J 


c 


z 




0 


0 


c 


z 



= k 



If a, b, and c are not all zero, then the graphs of these equations in are called central quadrics in standard position. 



(11) 



Identifying Conic Sections 

We are now ready to consider the first of the three problems posed earlier, identifying the curve or surface represented by an 
equation = k^^ two or three variables. We will focus on the two-variable case. We noted above that an equation of the 
form 



ax^ + Ibxy + cj^ + / = 0 



(12) 



represents a central conic. If = 0, then the conic is in standard position, and if ^ i^t 0, it is rotated. It is an easy matter to 
identify central conies in standard position by matching the equation with one of the standard forms. For example, the equation 

9x^-h 16^^-144 = 0 

can be rewritten as 




which, by comparison with Table 1, is the ellipse shown in Figure 7.3.3. 




Figure 7.3.3 



If a central conic is rotated out of standard position, then it can be identified by first rotating the coordinate axes to put it in 
standard position and then matching the resulting equation with one of the standard forms in Table 1 . To find a rotation that 
eliminates the cross product term in the equation 

ax^^2bxy^cy^ = k (13) 



it will be convenient to express the equation in the matrix form 



^ y 



a b 

b c 



and look for a change of variable 

that diagonalizes A and for which det(P) = 1 . Since we saw in Example 4 of Section 7. 1 that the transition matrix 



P = 



cos 6 — sin^ 
sin^ cos ^ 



(14) 



(15) 



has the effect of rotating the xy-axes of a rectangular coordinate system through an angle 9, our problem reduces to finding 9 that 
diagonalizes A, thereby eliminating the cross product term in 13. If we make this change of variable, then in the ;t>' -coordinate 
system. Equation 14 will become 





"Al 


0 ■ 


V 


/ y' 


0 


A2_ 


y' 



= k 



where Aj and A2 are the eigenvalues of ^. The conic can now be identified by writing 16 in the form 



(16) 



(17) 



and performing the necessary algebra to match it with one of the standard forms in Table 1. For example, if Aj, ..\2, and k are 
positive, then 17 represents an ellipse with an axis of length 2^k f X\ the 7: '-direction and 2^^7A2 in the y' -direction. The 



first column vector of P, which is a unit eigenvector corresponding to Aj, is along the positive ;ir'-axis; and the second column 
vector of P, which is a unit eigenvector corresponding to .\2, is a unit vector along the y -axis. These are called the principal 
axes of the ellipse, which explains why Theorem 7.3.1 is called "the principal axes theorem." (See Figure 7.3.4.) 



Unit eigenvector for A 2 ^ 
(-sin e^, cosiiS.^ 


^ V 






\^ U n i t e ige n vector for A 



Figure 7.3.4 



EXAMPLE 3 Identifying a Conic by Eliminating the Cross Product Term A 

(a) Identify the conic whose equation is 5;r — Any I 8j — 36 = 0 by rotating the jcy-axes to put the conic in 
standard position. 

(b) Find the angle 9 through which you rotated the xy-axes in part (a). 



Solution 

(a) The given equation can be written in the matrix form 
where 

5 -2 



The characteristic polynomial of A is 



A = 



A-5 2 

2 A-; 



-2 8 



= (A-4)(A-9) 



so the eigenvalues are A = 4 and A = 9- We leave it for you to show that orthonormal bases for the eigenspaces 
are 



A = 4: 



Thus, A is orthogonally diagonalized by 



X = 9 



1 

f5 



P = 



_2_ L 

f5 -f5 
1 2 
f5 f5 



(18) 



Had it turned out that det(P) = — 1 , then we 
would have interchanged the columns to reverse the 
sign. 





'A 0' 


'x'' 


^' y' 


0 9 


y' _ 



Moreover, it happens by chance that det(P) = 1, so we are assured that the substitution ^ = performs a 
rotation of axes. It follows from 16 that the equation of the conic in the x y' -coordinate system is 



= 36 



which we can write as 

4/2 + 9/2^36 or ^ + ^ = 1 

We can now see from Table 1 that the conic is an ellipse whose axis has length 2q = 6 in the .t ' -direction and 
length 2,d = 4 in the y " -direction. 

(b) It follows from 15 that 

" 2 L 

COS B — sinfl 
1 2 sinfl cosfll 

f5 f5 



P = 



which imphes that 



COS u Z 



Thus, 9 = tan"^i ^26.6^ (Figure 7.3.5). 




Figure 7.3.5 



Remark In the exercises we will ask you to show that if ^ 0, then the cross product term in the equation 

ax^ -\~ 2bxy + cy^ = k 

can be eliminated by a rotation through an angle 6 that satisfies 



cot 20 = 



a — c 
2b 



We leave it for you to confirm that this is consistent with part (b) of the last example. 



Positive Definite Quadratic Forms 



We will now consider the second of the two problems posed earlier, determining conditions under which x Ax > 0 for 
nonzero values of x. We will explain why this is important shortly, but first we introduce some terminology. 



The terminology in Definition 1 also applies to the 
matrix A; that is, A is positive definite, negative definite, 
or indefinite in accordance with whether the associated 
quadratic form has that property. 

r 

DEFINITION 1 

A quadratic form is said to be 

positive definite if x'^^ > 0 for x 0 

negative definite if x'^^ < 0 ^r x ?t 0 

indefinite if x^^ has both positive and negative values 

L 

The following theorem, whose proof is deferred to the end of the section, provides a way of using eigenvalues to determine 
whether a matrix^ and its associated quadratic form x^^ are positive definite, negative definite, or indefinite. 



THEOREM 7.3.2 

If ^ is a symmetric matrix, then: 

M iJ'Ax: is positive definite if and only if all eigenvalues of A are positive. 

(b) sJ'Ax: is negative definite if and only if all eigenvalues of A are negative. 

(c) sJ'Ax. is indefinite if and only if A has at least one positive eigenvalue and at least one negative eigenvalue. 



Remark The three classifications in Definition 1 do not exhaust all of the possibilities. For example, a quadratic form for 
which -jJ^Ax. > 0 if X ^ 0 is called positive semidefinite, and one for which x^^ < 0 if x ?t 0 is called negative semidefinite. 
Every positive definite form is positive semidefinite, but not conversely, and every negative definite form is negative 
semidefinite, but not conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that x^^ is 
positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative semidefinite if and only if all 
eigenvalues of A are nonpositive. 



EXAMPLE 4 Positive Definite Quadratic Forms ^ 



It is not usually possible to tell from the signs of the entries in a symmetric matrix A whether that matrix is 
positive definite, negative definite, or indefinite. For example, the entries of the matrix 

'3 1 r 



A = 



1 0 2 
1 2 0 



are nonnegative, but the matrix is indefinite since its eigenvalues are A = 1,4, _2 (verify). To see this another 
way, let us write out the quadratic form as 



x\ X2 X2 



3 1 1 

1 0 2 
1 2 0 



^1 

^3 



= 3;:^ + 27:17:2 + 27:17:3 + 47:27:3 



Positive definite and negative definite matrices 
are invertible. Why? 



We can now see, for example, that 



and 



x^Ax = 4 for ;^i = 0, X2=\. :^3 = 1 



x'^^^ -4 for :^i=0, X2 = l, = - 1 



Classifying Conic Sections Using Eigenvalues 

If x^Bx = ^ is the equation of a conic, and if t ^ 0? then we can divide through by k and rewrite the equation in the form 

x^^=l (20) 

where i4 = (1 / If we now rotate the coordinate axes to eliminate the cross product term (if any) in this equation, then the 
equation of the conic in the new coordinate system will be of the form 

Ai;t'2^A2y'^ = l (21) 

in which Aj and A2 are the eigenvalues of A. The particular type of conic represented by this equation will depend on the signs 
of the eigenvalues Aj and A2. For example, you should be able to see from 21 that: 

• x^Ax. = 1 represents an ellipse if Aj > 0 and A2 > 0. 

• x^Ax = 1 has no graph if Aj <. 0 and A2 < 0. 

• x^Ax = 1 represents a hyperbola if X\ and A2 have opposite signs. 
In the case of the ellipse. Equation 21 can be rewritten as 




Figure 7.3.6 



The following theorem is an immediate consequence of this discussion and Theorem 7.3.2. 



THEOREM 7.3.3 



If is a symmetric 2x2 matrix, then: 

M sJ^Ax = 1 represents an ellipse if A is positive definite. 

(b) x^Asc = 1 has no graph if A is negative definite. 

(^) x^Aa = 1 represents a hyperbola if A is indefinite. 



In Example we performed a rotation to show that the equation 



■ 0 



represents an ellipse with a major axis of length 6 and a minor axis of length 4. This conclusion can also be obtained by 
rewriting the equation in the form 



and showing that the associated matrix 



A = 



5 


1 


36 


18 


1 


2 


18 


9 



has eigenvalues = and A2 = ^. These eigenvalues are positive, so the matrix^ is positive definite and the equation 

represents an ellipse. Moreover, it follows from 21 that the axes of the ellipse have lengths 2 / /a7= 6 and 2 / ^^=4, which 
is consistent with Example 3. 



Identifying Positive Definite Matrices 

Positive definite matrices are the most important symmetric matrices in applications, so it will be useful to learn a little more 
about them. We already know that a symmetric matrix is positive definite if and only if its eigenvalues are all positive; now we 
will give a criterion that can be used to determine whether a symmetric matrix is positive definite without finding the 
eigenvalues. For this purpose we define the kth principal submatrix of an ^ x n matrix A to be the ^ x A" submatrix consisting of 
the first k rows and columns of A. For example, here are the principal submatrices of a general 4x4 matrix: 



"^11 ^12 ^13 «14' 
^21 ^22 '^23 ^24 
^31 ^32 ^33 «34 
ct^i ^43 <^44 




'^11 ^12 ^13 ^14" 
^^21 ^22 ^23 ^24 
a2i a22 ^33 ^34 
^4\ ^42 ^43 ^44 




'an au a{3 ^14' 
^21 ^22 ^23 ^24 
^31 '^32 <^33 ^34 

1^41 (342 "^43 "^44 




'an a\2 ^13 ^14' 
^21 ^22 ^23 ^24 
a2\ a22 ^33 ^34 
1^41 1^42 <^43 ^44 




First principal submatrix 


Second principal submatrix 


Thiid pimcipiil submatrix 


Fourth principal submatrix=^ 



The following theorem, which we state without proof, provides a determinant test for ascertaining whether a symmetric matrix is 
positive definite. 



THEOREM 7.3.4 

A symmetric matrix A is positive definite if and only if the determinant of every principal submatrix is positive. 



EXAMPLE 5 Working with Principal Submatrices A 



The matrix 



A = 



2 
-1 
-3 



is positive definite since the determinants 

|2| = 2, 



2 -1 
-1 2 



-1 -3 

2 4 
4 9 



= 3, 



2 -1 -3 

-12 4 
-3 4 9 



= 1 



are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and > 0 for x :?f: 0- 



OPTIONAL 

We conclude this section with an optional proof of Theorem 7.3.2. 

Proofs of Theorem 7.3.2(a) and (b) It follows from the principal axes theorem (Theorem 7.3.1) that there is an orthogonal 
change of variable x = Py for which 

x^Ax = y'^Dy = Ai^^ + X^yj + - + ^^^^ (2^) 

where the X's are the eigenvalues of A. Moreover, it follows from the invertibility of P that y 0 if and only if x 0? so the 
values of x"^^ for x 0 are the same as the values of y^Dy for y 0 . Thus, it follows from 23 that x^^4x > 0 for x ^ 0 if and 
only if all of the X's in that equation are positive, and that x^Ax < 0 for x 3t 0 if and only if all of the X's are negative. This 
proves parts (a) and (b). 

Proof (c) Assume that^ has at least one positive eigenvalue and at least one negative eigenvalue, and to be specific, suppose 
that Ai > 0 and A2 < 0 in 23. Then 

X Ax.>0 if7i = l and all others 5 are 0 

and 

X -4x> 0 if 72 = 1 and all other 7 s are 0 

which proves that s^Ax. is indefinite. Conversely, if x^^ > 0 for some x, then y^Dy > 0 for some y, so at least one of the X's 
in 23 must be positive. Similarly, if x'^^ < 0 for some x, then y^Dy < 0 for some y, so at least one of the X's in 23 must be 
negative, which completes the proof 



Concept Review 

• Linear form 

• Quadratic form 

• Cross product term 

• Quadratic form associated with a matrix 

• Change of variable 

• Orthogonal change of variable 

• Principal Axes Theorem 

• Conic section 



Degenerate conic 
Central conic 

Standard position of a central conic 
Standard form of a central conic 
Central quadric 
Principal axes of an ellipse 
Positive definite quadratic form 
Negative definite quadratic form 
Indefinite quadratic form 
Positive semidefinite quadratic form 
Negative semidefinite quadratic form 
Principal submatrix 



Express a quadratic form in the matrix notation x'' Ax:^ where ^ is a symmetric matrix. 

Find an orthogonal change of variable that eliminates the cross product terms in a quadratic form, and express the 
quadratic form in terms of the new variable. 

Identify a conic section from an equation by rotating axes to place the conic in standard position, and find the angle 
rotation. 

Identify a conic section using eigenvalues. 

Classify matrices and quadratic forms as positive definite, negative definite, indefinite, positive semidefinite or 
negative semidefinite. 



In Exercises 1-2, express the quadratic form in the matrix notation x -<4x? where ^ is a symmetric matrix. 



Skills 



Exercise Set 7.3 



1. 



(a) 3xj -\- Ixl 

(b) Axl-9x2-(>x\X2 

9 9 9 

(c) 9a-j — ^2 +4^3 + 6x1X2 — 8x-i;r3 + 



Answer: 



(a) 



[^1 ^2] 



3 0 r^i 

0 7 ^2 



4 
-3 



-3 r^i 

-9 ^2 



(c) 



9 



3 -4 



[^1 ^2^3] 



-4 



3 - 




^1 
^2 
^3 



2- (a) 5x^ + 5x1X2 
(b) -7jflX2 



(c) x^-^xj- 3xj - 57:13:2 + 9x1X3 



In Exercises 3-4, find a formula for the quadratic form that does not use matrices. 
3. 



X y 



2 -3 
-3 5 



Answer: 



4. 



['1 ^2 ^3] 



1 6 3 



^1 
^3 



In Exercises 5-8, find an orthogonal change of variables that eliminates the cross product terms in the quadratic form g, and 
express Q in terms of the new variables. 

5.^ = 2x1^ + 2x1-2x1X2 
Answer: 



J L 

fl fl 

J L 

{2 {2. 



6. Q = 5xj + 2xj + 4xj + 4x1X2 

7. (2 = 3xi + + 5x3 + 4x1X2 - 4x2X3 



Answer: 



^1 
X3 



2 2 

■3 3 

2 i 

3 3 

1 1 

3 3 



71 
73 



6=71+472+773 



8. (2 = 2xi + 5x2 + 5^3 + 4'1^2 - 4xiX3 - 8x2x3 

In Exercises 9-10, express the quadratic equation in the matrix form x^Ax. + Kx + f=0, where x^Ax is the associated 
quadratic form and K is an appropriate matrix. 

^•(a) 2x^ + xy + x-6y + 2 = 0 
(b) y^ + 7x-Sy-5 = 0 



Answer: 

(a) 

[X7] 



I 0 



[;]+[-i6][;]+2=o 



lO-(a) x^-xy + 5x + By-3 = 0 

(b) 5xy = S 

In Exercises 1 1-12, identify the conic section represented by the equation. 

"•(a) 2x^ I 57^ = 20 

(b) x^-y^-S = 0 

(c) ly^-2x = 0 

(d) x^+y^-25 = 0 

Answer: 

(a) ellipse 

(b) hyperbola 

(c) parabola 

(d) circle 

12- (a) 4x^ + = 1 

(b) 4x^-5y^ = 20 

(c) -x^ = 2y 

(d) x2-3 = 

In Exercises 13-16, identify the conic section represented by the equation by rotating axes to place the conic in standard 
position. Find an equation of the conic in the rotated coordinates, and find the angle of rotation. 

13. 2x^-4xy -7^ + 8 = 0 

Answer: 

Hyperbola: 20')^ - 3(x')^ = 8; w - 26 . 6* 
14. 5:^2^ 4x7 + 57^ = 9 
15. 11:^2 ^ 24xy'^4y^ - 15 = 0 

Answer: 

Hyperbola: 4(;f')^-0')^ = 3; 8 = 36.9° 

In Exercises 17-18, determine by inspection whether the matrix is positive definite, negative definite, indefinite, positive 
semidefinite, or negative semidefinite. 

(b) r-1 0] 



(c) 


-1 01 

0 2j 


(d) 


'l 01 

0 oJ 


(e) 


[2-3] 


Answer: 


(a) Positive definite 


(b) Negative definite 


(c) Indefinite 


(d) Positive semidefinite 


(e) Negative semidefinite 


(a) 




(b) 


:i J] 


(c) 


2 Ol 


(d) 


0 Ol 
.0 -5j 


(e) 


'2 Ol 

0 oJ 



In Exercise 19-24, classify the quadratic form as positive definite, negative definite, indefinite, positive semidefinite, or 
negative semidefinite. 

Answer: 

Positive definite 

Answer: 

Positive semidefinite 
22._(xi-X2)2 

Answer: 

Indefinite 
24.^1^2 

In Exercises 25-26, show that the matrix A is positive definite first by using Theorem 7.3.2 and second by using Theorem 
7.3.4. 



25. 



(b) 



A = 



26. 



(b) 



A = 



5 -2 
-2 5 

2- 10 
-1 2 0 

0 0 5 

2 r 

1 2 

3- 1 0 
-1 2 -1 

0-1 3 



In Exercises 27-28, find all values of A: for which the quadratic form is positive definite. 
27. 5x1 +X2-¥kxl + Ax\X2 - 2x1X3 - 2x3x3 
Answer: 

28. 3xi + X2 + 2x3 - 2x1X3 + 2itx2a:3 

Let T^Ax. be a quadratic form in the variables xi, X2, Xj,, and define T.B!* —*R by ^(*) ~ X^Ax. 

(a) Show that rfx + y) = rfx) + 2x^.i<ly + 7(y). 

(b) Show that ^(cxj = c^rjx) 

30. Express the quadratic form (cj^i + C2X2 + -- -+ '^m-^m)^™ matrix notation x^Ax^ where A is symmetric. 

31. In statistics, the quantities 



and 



5? = ^3y[(^1-?)^ I (X2-If I ...+ (x„-x)2] 

are called, respectively, the sample mean and sample variance of x = (x\, X2, 7:y^ . 

(a) Express the quadratic form in the matrix notation x^j4k? where A is symmetric. 

(b) Is a positive definite quadratic form? Explain. 



Answer: 



(a) 



A = 



(b) Yes 



i 



1 



1 



1 



«(«-!) 
1 



«(«-l) 
1 



1 



«(«-l) «(«-l) 



«(«-!) 
1 

«(«-l) 

1 



32. The graph in an xyz-coordinate system of an equation of form ax^ + by^ + = 1 in which a, b, and c are positive is a 

surface called a central ellipsoid in standard position (see the accompanying figure). This is the three-dimensional 

2 2 2 2 2 

generalization of the ellipse ax by = 1 in the xy-plane. The intersections of the ellipsoid ax + by =H cz = 1 with the 



coordinate axes determine three line segments called the axes of the ellipsoid. If a central ellipsoid is rotated about the origin 
so two or more of its axes do not coincide with any of the coordinate axes, then the resulting equation will have one or more 
cross product terms. 

(a) Show that the equation 

represents an ellipsoid, and find the lengths of its axes. [Suggestion: Write the equation in the form = 1 and make 
an orthogonal change of variable to eliminate the cross product terms. 

(b) What property must a symmetric 3x3 matrix have in order for the equation x^Ax = 1 to represent an ellipsoid? 




Figure Ex-32 

33. What property must a symmetric 2x2 matrix A have for = 1 to represent a circle? 
Answer: 

A must have a positive eigenvalue of multipHcity 2. 

34. Prove: If ^ 0? then the cross product term can be eliminated from the quadratic form ax + 2bxy + cy by rotating the 



coordinate axes through an angle 6 that satisfies the equation 

cot 20 -- 



2b 

35. Prove that if ^ is an ^ x w symmetric matrix all of whose eigenvalues are nonnegative, then iJ'As. > 0 for all nonzero x in 

True-False Exercises 

In parts (a)-(l) determine whether the statement is true or false, and justify your answer. 

(a) A symmetric matrix with positive definite eigenvalues is positive definite. 
Answer: 

True 

(b) xj-X2-hX2+ 47:17:27:3 is a quadratic form. 
Answer: 

False 

(c) _ 3^2)^ is a quadratic form. 
Answer: 

True 

(d) A positive definite matrix is invertible. 
Answer: 



True 

(e) A symmetric matrix is either positive definite, negative definite, or indefinite. 
Answer: 

False 

(f) If A is positive definite, then — j4 is negative definite. 
Answer: 

True 

(g) X • X is a quadratic form for all x in /j". 
Answer: 

True 

(h) If x^Ax is a positive definite quadratic form, then so is x^j4~^x- 
Answer: 

True 

(i) If ^ is a matrix with only positive eigenvalues, then is a positive definite quadratic form. 
Answer: 

False 

(j) If ^ is a 2 X 2 symmetric matrix with positive entries and det(j4) > 0, then A is positive definite. 
Answer: 
True 

(k) If x^Ax is a quadratic form with no cross product terms, then ^ is a diagonal matrix. 
Answer: 
False 

(1) If x^Ax. is a positive definitequadratic form in two variables and c ^ 0? then the graph of the equation x^Ax, = c is an ellipse. 
Answer: 
False 
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7.4 Optimization Using Quadratic Forms 



Quadratic forms arise in various problems in which the maximum or minimum value of some quantity is required. 
In this section we will discuss some problems of this type. 



Constrained Extremum Problems 



Our first goal in this section is to consider the problem of finding the maximum and minimum values of a 
quadratic form subject of the constraint ||x|| = 1 . Problems of this type arise in a wide variety of 

applications. 

To visualize this problem geometrically in the case where is a quadratic form on view z = x^Ax. the 

equation of some surface in a rectangular xyz-coordinate system and view ||x|| = 1 as the unit circle centered at 
the origin of the xy-plane. Geometrically, the problem of finding the maximum and minimum values of x^^ 
subject to the requirement ||x|| = 1 amounts to finding the highest and lowest points on the intersection of the 
surface with the right circular cylinder determined by the circle (Figure 7.4.1). 



Constrained 
/ minimum 



Constrained 
maximum 




•Unit circle 
Figure 7.4.1 



The following theorem, whose proof is deferred to the end of the section, is the key result for solving problems of 
this type. 



THEOREM 7.4.1 Constrained Extremum Theorem 

Let v4 be a symmetric « x « matrix whose eigenvalues in order of decreasing size are 
Ai>A2 > • • • >A„. Then: 

(a) the quadratic form x "^Ax: attains a maximum value and a minimum value on the set of vectors for 
which ||x|| = 1; 

(b) the maximum value attained in part (a) occurs at a unit vector corresponding to the eigenvalue ; 

(c) the minimum value attained in part {a) occurs at a unit vector corresponding to the eigenvalue A^. 

Remark The condition ||x|| = 1 in this theorem is called a constraint, and the maximum or minimum value of 
x^Ax: subject to the constraint is called a constrained extremum. This constraint can also be expressed as 
x'^x = \ x} + X2 + ' ' ' + = 1 ? when convenient. 



EXAMPLE 1 Finding Constrained Extrema M 



Find the maximum and minimum values of the quadratic form 

z = 5x^ + 5y^ + 4xy 

2 2 

subject to the constraint x +y = 1. 



Solution The quadratic form can be expressed in matrix notation as 

'5 2 



z = 5x^ + 5y^ + 4xy = x^Ak = 



2 5 



We leave it for you to show that the eigenvalues of A are Ai = 7 and A2 = 3 and that corresponding 
eigenvectors are 



Ai=7: 

Normalizing these eigenvectors yields 



. A2 = 3: 



Ai=7: 



1 



, A2 = 3: 



-1 

1 



1 

1 



Thus, the constrained extrema are 



1 1 



constrained maximum:z = 7 2X{x, y) = — 

/ 1 1 

constrained minimum:z = 3 at(x, y) = — j=, — p= 



(1) 



Remark Since the negatives of the eigenvectors in 1 are also unit eigenvectors, they too produce the maximum 
and minimum values of z; that is, the constrained maximum z = l also occurs at the point 

{x, ^) = I — — I and the constrained minimum z = 3 at {x, y) = | — 7= . 

EXAMPLE 2 A Constrained Extremum Problem A 

A rectangle is to be inscribed in the ellipse Ax I 9y = 36, as shown in Figure 7.4.2.Use 

eigenvalue methods to find nonnegative values of x and j that produce the inscribed rectangle with 
maximum area. 



i 




















— •— ► 









Figure 7.4.2 A rectangle inscribed in the ellipse 4x + 9y = 36. 



Solution The area z of the inscribed rectangle is given by ^ = 4xy, so the problem is to maximize 

2 2 

the quadratic form z = 4xy subject to the constraint 4^: + 9^^ = 36. In this problem, the graph of 

the constraint equation is an ellipse rather than the unit circle as required in Theorem 7.4.1, but we 
can remedy this problem by rewriting the constraint as 

(ff*(ff=> 

and defining new variables, x i and y i , by the equations 

x = 3x\ and y = 2y\ 

This enables us to reformulate the problem as follows: 

maximize z = 4xy = 24x ly \ 

subject to the constraint 

To solve this problem, we will write the quadratic form z = 24x ly i as 

z = x^Am = 

We now leave it for you to show that the largest eigenvalue of ^ is X = 12 and that the only 
corresponding unit eigenvector with nonnegative entries is 

1 

f2 

1 





' 0 


12" 










12 


0_ 


/I 



Thus, the maximum area is ^ = 12, and this occurs when 

x = 3x\ = -^ and 7 = 2^1 
/2 



2 



Constrained Extrema and Level Curves 



A useful way of visualizing the behavior of a function f (x, y) of two variables is to consider the curves in the 
xy-plane along which f (^x, y ) constant. These curves have equations of the form 



and are called the level curves of f (Figure 7.4.3).In particular, the level curves of a quadratic form x ^ on /J 
have equations of the form 



(2) 



so the maximum and minimum values oiyj^ subject to the constraint ||x|| = 1 are the largest and smallest 
values of k for which the graph of 2 intersects the unit circle. Typically, such values of k produce level curves that 
just touch the unit circle (Figure 7.4.4), and the coordinates of the points where the level curves just touch produce 
the vectors that maximize or minimize x "^Ax. subject to the constraint ||x|| = 1 . 



/ 



r 



Plane c = A- 




x'Ax = k 



Figure 7.4.4 



EXAMPLE 3 Example 1 Revisited Using Level Curves A 



In Example 1 (and its following remark) we found the maximum and minimum values of the 
quadratic form 

z = 5x^ + 5y^ + Axy 

subject to the constraint x -V y —\ - We showed that the constrained maximum is ^ = 7, and this is 
attained at the points 



{X, y) = {-^,^ and {x,y) = ^ ^ ^ ^ 



J L 

1/2' {ij 



and that the constrained minimum ^ = 3? ^i^d this is attained at the points 



(3) 



(4) 



2 2 

Geometrically, this means that the level curve 5x + 5y + Axy = 7 should just touch the unit 

- 2 2 

circle at the points in 3, and the level curve 5^: + 5^^ + Axy = 3 should just touch it at the points 
in 4. All of this is consistent with Figure 7.4.5. 



Jc2 + y2=| 




Figure 7.4.5 



CALCULUS REQUIRED 

Relative Extrema of Functions of Two Variables 

We will conclude this section by showing how quadratic forms can be used to study characteristics of real-valued 
functions of two variables. 

Recall that if a function f (^x, y) has first-order partial derivatives, then its relative maxima and minima, if any, 
occur at points where 

f,ix,y) = 0 c^-id fy(x,y) = 0 

These are called critical points of f. The specific behavior of/at a critical point (^q, ^o) determined by the sign 
of 

Dix, y)=f{x, 7)-/(^0, 70) (5) 

at points (x, y) that are close to, but different from, (^x[^, 70) • 

• If D{x, y) > 0 at points f^, y") that are sufficiently close to, but different from, (^xq, 70)' ^^^^ 

f (^0' 70) < / y) ^^^^ points and /is said to have a relative minimum at (;^q^ y^) (Figure 7.4.6a). 

• If D{x, 7) < 0 at points (x, y) that are sufficiently close to, but different from, (::^:q^ ^q), then 

/ (^0» 70) / y) ^^^^ points and /is said to have a relative maximum at (^g, 70) (I^ig^^^ 7 A. 6b). 

• If 7) has both positive and negative values inside every circle centered at (^q, 70)' ^'^^^ there are points 
(^x, y) that are arbitrarily close to (^x{), yo) at which / (;:^:q^ ^q) < f (x, y) and points (x, 7) that are 
arbitrarily close to (^q, ^'q) which / {^xq, 70) > / 7) * ^^^^ ^^^^ ^'^^^ /has a saddle point at 
(^0. 70) (Figure 7.4.6c). 



Relative minimum at (0, 0) 
ia) 




Relative maximum at (0, 0) 
(b) 



z 




Saddle point at (0, 0) 



(c) 

Figure 7.4.6 

In general, it can be difficult to determine the sign of 5 directly. However, the following theorem, which is proved 
in calculus, makes it possible to analyze critical points using derivatives. 

y I 

THEOREM 7.4.2 Second Derivative Test 

Suppose that (^g, yg ^ critical point of /' (^x, y ) and that /has continuous second-order partial 
derivatives in some circular region centered at (xq, ^'o)* Then: 



(a) /has a relative minimum at (^g, yo) if 

(b) /has a relative maximum at _yg) if 

fc^ /has a saddle point at (xg, yo) if 

/;r;r(jfO. >'o)/j;j;(^0.>'o) -/?;;(^0, ^o) < 0 

(d) The test is inconclusive if 

f xxi^^, y0)fyy(X0, yo) -f'^yiXQ. yo)=0 



Our interest here is in showing how to reformulate this theorem using properties of symmetric matrices. For this 
purpose we consider the symmetric matrix 

fxxix.y) fxyix.y) 
f xy{^,y) fyy(x,y) 



fi(x.y) = 



which is called the Hessian or Hessian matrix of fin honor of the German mathematician and scientist Ludwig 
Otto Hesse (1811-1874). The notation H(x, y) emphasizes that the entries in the matrix depend onx and The 
Hessian is of interest because 



det 



= fxx (^0, yo)fyy (^0, 7o) - f xy (^0, 7o) 



is the expression that appears in Theorem 7.4.2. We can now reformulate the second derivative test as follows. 



THEOREM 7.4.3 Hessian Form of the Second Derivative Test 

Suppose that ( y,-, ) is a critical point of f (^x,y) and that /has continuous second-order partial 
derivatives in some circular region centered at (^q, ^q)* ^(^0. 7o) Hessian of f at (^xq, 70)' 

(a) /has a relative minimum at (;^:q^ ^q) if H{x^, y^y) is positive definite. 

(b) /has a relative maximum at (^g, 70) /^(^O^ y^) ii^g^tive definite. 

(c) /has a saddle point at (xg, 70) -^(^0* 7o) indefinite. 

(d) The test is inconclusive otherwise. 



We will prove part {a). The proofs of the remaining parts will be left as exercises. 

Proof (a) If H{x\], 70) positive definite, then Theorem 7.3.4 implies that the principal submatrices of 
H(x{}, yo) have positive determinants. Thus, 



det[//(7:o, 70)] = 



and 



so /has a relative minimum at {x{^, y^y) by part (a) of Theorem 7.4.2. 



EXAMPLE 4 Using the Hessian to Classify Relative Extrema A 

Find the critical points of the function 

/ {x, 7 j = ^x^ + xy^ ^^xy + 3 

and use the eigenvalues of the Hessian matrix at those points to determine which of them, if any, are 
relative maxima, relative minima, or saddle points. 

Solution To find both the critical points and the Hessian matrix we will need to calculate the first 
and second partial derivatives of f. These derivatives are 

/Ax, y)=x^+y^-Sy, fy{x,y) = 2xy-Zx, f ,y{x, y) = 2y -2, 
fxxix, y) = 2x, f yyix, y) = 2x 

Thus, the Hessian matrix is 

'fxxix.y) /xy(.x,y) 
fxy(x,y) fyy(x,y) 

To find the critical points we set f ^ and f y equal to zero. This yields the equations 

f,{x,y)=x^-\-y'^-^y = ^ and f y{x, y) = 2xy = 2x{y -A) = (i 

Solving the second equation yields ;if = Q or y = 4- Substituting ;]f = Q in the first equation and 
solving for J yields y = 0 or y = 8; and substituting y = 4 into the first equation and solving for x 
yields ;r = 4 or j = — 4- Thus, we have four critical points: 

(0,0). (0,8), (4,4), (-4.4) 
Evaluating the Hessian matrix at these points yields 





f 1 




H 


x,y 











2x 2y-Z 
2^-8 2x 



H(0, 0) = 
H(4, 4) = 



0 -8 
-8 0 

8 0 
0 8 



H(0, 8) = 
H(-4 4) = 



0 8 
8 0 

-8 



0 -8 



We leave it for you to find the eigenvalues of these matrices and deduce the following classifications 
of the stationary points: 



Critical Point (xq, jo) 


u 


^2 


Classification 


(0, 0) 


8 


-8 


Saddle point 


(0, 8) 


8 


-8 


Saddle point 





Critical Point (xq, jo) 




^2 


Classification 


(4,4) 


8 


8 


Relative minimum 


(-4,4) 


-8 


-8 


Relative maximum 



OPTIONAL 

We conclude this section with an optional proof of Theorem 7.4.1. 

Proof of Theorem 7.4. 1 The first step in the proof is to show that Jix has constrained maximum and minimum 
values for ||x|| = 1. Since A is symmetric, the principal axes theorem (Theorem 7.3.1) implies that there is an 
orthogonal change of variable x = Py such that 

x'^Jix = Xiyj+\2y2+ ' ' ' (6) 

in which Ai , A2, . - A„ are the eigenvalues of A. Let us assume that ||x|| = 1 and that the column vectors of P 
(which are unit eigenvectors of A) have been ordered so that 

Ai>A2> • • • >A„ (7) 

Since the matrix P is orthogonal, multiplication by P is length preserving, so that ||y|| = ||x|| = 1 ; that is, 

yj+y2 + • • • +7^ = 1 

It follows from this equation and 7 that 

< +72+ • • • +7«) = Ai 

and hence from 6 that 

A„<x^^<Ai 

This shows that all values of for which ||x|| = 1 lie between the largest and smallest eigenvalues of A. Now 
let X be a unit eigenvector corresponding to . Then 

x^^ = x^(Aix) = Aix^x = Ai||x|P =Ai 

which shows that has Aj as a constrained maximum and that this maximum occurs if x is a unit eigenvector 
of v4 corresponding to A^. Similarly, if x is a unit eigenvector corresponding to A„, then 

x^^ = x^^A„xJ = A„x^x = A„||x|p =A„ 

so x^^ has \^ as a constrained minimum and this minimum occurs if x is a unit eigenvector of A corresponding 
to A^. This completes the proof. 



Concept Review 

• Constraint 

• Constrained extremum 

• Level curve 

• Critical point 

• Relative minimum 

• Relative maximum 

• Saddle point 

• Second derivative test 

• Hessian matrix 

Skills 

• Find the maximum and minimum values of a quadratic form subject to a constraint. 

• Find the critical points of a real-valued function of two variables, and use the eigenvalues of the Hessian 
matrix at the critical points to classify them as relative maxima, relative minima, or saddle points. 



Exercise Set 7.4 

In Exercises 1-4, find the maximum and minimum values of the given quadratic form subject to the constraint 
2 2 

X +y = 1, and determine the values of x mdy at which the maximum and minimum occur. 
Answer: 

Maximum: 5 at (1, 0) and (—1,0); minimum: —1 at (0, 1) and (0, — 1) 

2. xy 

3. 3x^ + ly^ 
Answer: 

Maximum: 7 at (0, 1) and (0, -1); minimum: 3 at (1, 0) and (-1,0) 

4. + 5xy 

In Exercises 5-6, find the maximum and minimum values of the given quadratic form subject to the constraint 

x^+y^+z^ = \ 

and determine the values of x, y, and z at which the maximum and minimum occur. 

5. 9x^ + 4y^ + 3z^ 



Answer: 



Maximum: 9 at (1, 0, 0) and (-1,0, 0); minimum: 3 at (0, 0, 1) and (0, 0,-1) 

6. + 2xy + 2xz 

7. Use the method of Example 2 to find the maximum and minimum values of xy subject to the constraint 
4x^ + 87^ = 16. 

Answer: 

Maximum: z = A}f2 (^,7) = ^2^, 2j and ^— 2^, — 2j; minimum: z= —A^ at 
(^,7) = (-2/2, 2) and (2/2, -2) 

8. Use the method of Example 2 to find the maximum and minimum values of x +xy + 2y subject to the 
constraint x +3y = \6. 

In Exercises 9-10, draw the unit circle and the level curves corresponding to the given quadratic form. Show that 
the unit circle intersects each of these curves in exactly two places, label the intersection points, and verify that 
the constrained extrema occur at those points. 

9. 5x^^y^ 
Answer: 




(^) Show that the function / 7 ) = -x^-y^ has critical points at (0, 0), (1 , 1), and ( - 1 , - 1 ) . 

(b) Use the Hessian form of the second derivative test to show /has relative maxima at (1, 1) and ( — 1, — 1) 
and a saddle point at (0, 0). 

* (^) Show that the function f\x,y\ = x -6xy-y has critical points at (0, 0) and (-2,2). 

(b) Use the Hessian form of the second derivative test to show /has a relative maximum at ( — 2, 2) and a 
saddle point at (0, 0). 

In Exercises 10-13, find the critical points of/ if any, and classify them as relative maxima, relative minima, or 
saddle points. 



13. /(X, y)=x^-3xy-y^ 



Answer: 

Critical points: (-1, 1), relative maximum; (0, 0), saddle point 

3 



l^-f[x.y) = x^ + 2y^-x^y 



Answer: 

Critical points: (0, 0), relative minimum; (2, 1) and (-2, 1), saddle points 



16. 



17. A rectangle whose center is at the origin and whose sides are parallel to the coordinate axes is to be inscribed 
in the ellipse x + 25y = 25. Use the method of Example 2 to find nonnegative values of x mdy that 
produce the inscribed rectangle with maximum area. 

Answer: 

r-^ v-J- 

Comerpomts: * ~ ^ ^ 

Suppose that the temperature at a point (^x, y) on a metal plate is Tfx, y^ = 4x^ — 4xy +y^. An ant, walking 

on the plate, traverses a circle of radius 5 centered at the origin. What are the highest and lowest temperatures 
encountered by the ant? 

(a) Show that the functions 

/(x.y'^^x^+y^ and g(x,y'^=x^^y^ 

have a critical point at (0, 0) but the second derivative test is inconclusive at that point. 

(b) Give a reasonable argument to show that /has a relative minimum at (0, 0) and g has a saddle point at (0, 
0). 



H = 



20. Suppose that the Hessian matrix of a certain quadratic form f (^x, y) is 

"2 4" 
_4 2_ 

What can you say about the location and classification of the critical points of /? 

21. Suppose that v4 is an x « symmetric matrix and 

where x is a vector inR^ that is expressed in column form. What can you say about the value of g if x is a unit 
eigenvector corresponding to an eigenvalue XofA7 

Answer: 
q(x)=X 



22. Prove: If x ^ is a quadratic form whose minimum and maximum values subject to the constraint ||x|| = 1 
are m and M, respectively, then for each number c in the interval m < c < M, there is a unit vector such that 
- = c [Hint: In the case where ^ < let and ujpf be unit eigenvectors of A such that j4u>>, = m 

C C III Til 

and \J'^Aa.M = M, and let 

f M —m f M —m 

Show thatxj^^=c7.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) A quadratic form must have either a maximum or minimum value. 
Answer: 

False 

(b) The maximum value of a quadratic form x^Ak subject to the constraint ||x|| = 1 occurs at a unit eigenvector 
corresponding to the largest eigenvalue of v4. 

Answer: 

True 

(c) The Hessian matrix of a function / with continuous second-order partial derivatives is a symmetric matrix. 
Answer: 

True 

(d) If (xQ, 7o) ^ critical point of a function / and the Hessian of f at (^q, yo) is 0, then /has neither a relative 
maximum nor a relative minimum (xq, y^})- 

Answer: 

False 

(e) If ^ is a symmetric matrix and (^et u4 < 0? then the minimum of x ^j4x subject to the constraint ||x|| = 1 is 
negative. 

Answer: 

True 
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7.5 Hermitian, Unitary, and Normal Matrices 



We know that every real symmetric matrix is orthogonally diagonalizable and that the real symmetric matrices 
are the only orthogonally diagonalizable matrices. In this section we will consider the diagonalization problem 
for complex matrices. 



Hermitian and Unitary Matrices 

The transpose operation is less important for complex matrices than for real matrices. A more useful operation 
for complex matrices is given in the following definition. 

r n 
DEFINITION 1 

If ^ is a complex matrix, then the conjugate transpose of A, denoted by j[ * , is defined by 

A* = A^ (1) 



Remark Since part {b) of Theorem 5.3.2 states that ^-^^j ~ (•^) •> the order in which the transpose and 
conjugation operations are performed in computing ^ * = ]4 ^ does not matter. Moreover, in the case where A 



has real entries we have A = (-^) = j4 ^ , so ^ * is the same as ^ ^ for real matrices. 



EXAMPLE 1 Conjugate Transpose A 



Find the conjugate transpose ^ * of the matrix 

1+j -i 0 



A = 



3-2j i 



Solution We have 



A = 



1-i i 0 

2 3 + 2i -i 



* —T 
and hence A =A = 



1-i 2 
i 3 + 2i 

0 -i 



The following theorem, parts of which are given as exercises, shows that the basic algebraic 
properties of the conjugate transpose operation are similar to those of the transpose (compare to 
Theorem 1.4.8). 



I:J 



THEOREM 7.5.1 

If ^ is a complex scalar, and ifA,B, and C are complex matrices whose sizes are such that the stated 
operations can be performed, then: 

{A*f=A 

[a-b]* = a*-b'' 

[kA)*=-kA'' 
{ab^ =B'A 



(e) / 



Remark Note that the relationship u • v = v^u in Formula 5 of Section 5.3 can be expressed in terms of the 
conjugate transpose as 



u • V = v*u (2) 



We are now ready to define two new classes of matrices that will be important in our study of diagonalization 
inC". 

r 

DEFINITION 2 

A square complex matrix A is said to be unitary if 

A-^=A* (3) 

and is said to be Hermitian if 

A* = A (4) 



Note that a unitary matrix can also be defined 



as a square complex matrix A for which 
AA =A A = I 



If ^ is a real matrix, then ^* = ^ ^, in which case 3 becomes A~^ =A^ ^ becomes a'^ = A Thus, the 
unitary matrices are complex generalizations of the real orthogonal matrices and Hermitian matrices are 
complex generalizations of the real symmetric matrices. 

EXAMPLE 2 Recognizing Hermitian Matrices M 

Hermitian matrices are easy to recognize because their diagonal entries are real (why?), and the 
entries that are symmetrically positioned across the main diagonal are complex conjugates. Thus, 
for example, we can tell by inspection that 

1 i 



A = 



—J Z—i 
1-j 2 + i 3 



is Hermitian. 



The fact that real symmetric matrices have real eigenvalues is a special case of the following more general 
result about Hermitian matrices, the proof of which is left for the exercises. 



THEOREM 7.5.2 



The eigenvalues of a Hermitian matrix are real numbers. 



The fact that eigenvectors from different eigenspaces of a real symmetric matrix are orthogonal is a special 
case of the following more general result about Hermitian matrices. 



THEOREM 7.5.3 



If ^ is a Hermitian matrix, then eigenvectors from different eigenspaces are orthogonal. 



Proof Let vi and V2 be eigenvectors of A corresponding to distinct eigenvalues Ai and X2- Using Formula 2 
and the facts that = Ai, A2 = A25 and A = A^ we can write 



AlCv2 • Vi) = CAiVi)*V2 = (^vi)*V2 = (vj'^*jv2 

Vi^Jv2 = Vi (Av2) 

= vj" (A2V2) = A2 (vi V2 j = A2 (V2 • VI ) 

This implies that (Aj — A2) (v2 • vi ) = 0 and hence that V2 • vi = 0 (since Ai * A2) . 



EXAMPLE 3 Eigenvalues and Eigenvectors of a Hermitian Matrix M 



Confirm that the Hermitian matrix 



[l-i 3 



has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. 

Solution The characteristic polynomial of A is 

A-2 -l-i 



det(A/-^) = 



-l+i A-3 
= (A-2)(A-3)-(-l-i)(-l+i) 
= Ja^_5A + 6)-2=CA-1)(A-4) 



so the eigenvalues of A are A = 1 and A = 4» which are real. Bases for the eigenspaces of ^ can be obtair 
by solving the linear system 

0 



A-2 -i-'ir^ii^r^ 

A-3j['2j [( 



with A = 1 and with A = 4- We leave it for you to do this and to show that the general solutions of these 
systems are 



Thus, bases for these eigenspaces are 

A=l: VI = 

The vectors vi and V2 are orthogonal since 



1 



and A = 4: V2 = 



V2 ■ V, = (-1 -;) i(l + (1)(1) = -0(1 -i) + 1 = 0 

and hence all scalar multiples of them are also orthogonal. 



Unitary matrices are not usually easy to recognize by inspection. However, the following analog of Theorems 
7.1.1 and 7.1.3, part of which is proved in the exercises, provides a way of ascertaining whether a matrix is 



unitary without computing its inverse. 



THEOREM 7.5.4 

If ^ is an ^ X « matrix with complex entries, then the following are equivalent. 

(a) A is unitary. 

(b) ||^|| = ||x||forallxinC''. 

(c) Ax.'Ay^'x.'Y for all x and y in C". 

(d) The column vectors of A form an orthonormal set in with respect to the complex Euclidean 
inner product. 

(e) The row vectors of A form an orthonormal set in C" with respect to the complex Euclidean inner 
product. 



EXAMPLE 4 A Unitary Matrix < 



Use Theorem 7.5.4 to show that 



A = 



is unitary, and then find A ""^ • 



Solution We will show that the row vectors 



ri = [1(1+0 ^O+O] and r2 = [1(1-0 ^("l+o] 

are orthonormal. The relevant computations are 

-I 



llrill 
llrill 



+ 



= i/i^=i 

12 2 
T2 2 



rrr2 = (i(l+i))(|o-0)+(i(l+0)(i(-l+0) 

= (i(l+j))(ia+o)+(i(l+i))(i(-l-i)) = |:-l. = 0 

Since we now know that A is unitary, it follows that 



A = L 



Unitary Diagonalizability 

Since unitary matrices are the complex analogs of the real orthogonal matrices, the following definition is a 
natural generalization of orthogonal diagonalizability for real matrices. 

r n 



DEFINITION 3 



A square complex matrix is said to be unitarily diagonalizable if there is a unitary matrix P such that 
p ^j\p = is a complex diagonal matrix. Any such matrix P is said to unitarily diagonalize A. 



Recall that a real symmetric ^ x n matrix A has an orthonormal set of n eigenvectors and is orthogonally 
diagonalized by any ,>2 x « matrix whose column vectors are an orthonormal set of eigenvectors of A. Here is 
the complex analog of that result. 



THEOREM 7.5.5 

Every ^ x n Hermitian matrix A has an orthonormal set of n eigenvectors and is unitarily diagonalized 
by any n Kn matrix P whose column vectors form an orthonormal set of eigenvectors of A. 

ij 



The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same as that for orthogonally 
diagonalizing a symmetric matrix: 

r n 

Unitarily Diagonalizing a Hermitian Matrix 

Step 1. Find a basis for each eigenspace of A. 

Step 2. Apply the Gram-Schmidt process to each of these bases to obtain orthonormal bases for the 
eigenspaces. 



Step 3. Form the matrix P whose column vectors are the basis vectors obtained in Step 2. This will 
be a unitary matrix (Theorem 7.5.4) and will unitarily diagonalize A. 



EXAMPLE 5 Unitary Diagonalization of a Hermitian Matrix A 



Find a matrix P that unitarily diagonalizes the Hermitian matrix 

2 1 +!" 



A = 



1 -J 3 



Solution We showed in Example 3 that the eigenvalues of A are ,\ = | and A = 4 and that bases 
for the corresponding eigenspaces are 



A=l:vi = 



1 



and A = 4:v2 = 



1 



Since each eigenspace has only one basis vector, the Gram-Schmidt process is simply a matter of 
normalizing these basis vectors. We leave it for you to show that 



-l-i 

1 



and P2 = = 



I|V2|| 



1 +i 

fe 

2 

fe 



Thus, A is unitarily diagonalized by the matrix 



P=[P1 P2] = 



-1 -i 1 +i 
{Z {I 

Although it is a little tedious, you may want to check this result by showing that 



P AP = 



-1 I I 1 



f3 fl 
\-i 2 



{I {I 



2 1+j 
l-i 3 



-1 -I 1 I I 



{3 fe 

1 _2_ 



1 0 
0 4 



Skew-Symmetric and Skew-Hermitian Matrices 

In Exercise 37 of Section 1 .7 we defined a square matrix with real entries to be skew-symmetric if ^4 ^ = — A- 
A skew-symmetric matrix must have zeros on the main diagonal (why?), and each entry off the main diagonal 



must be the negative of its mirror image about the main diagonal. Here is an example. 

[ skew — symmetric ] 





0 


1 


-2 


A = 


-1 


0 


4 




2 


-4 


0 


We leave it for you to confirm that 


T _ _ 


A 





The complex analogs of the skew- symmetric matrices are the matrices for which j[ = — A- Such matrices are 
said to be skew-Hermitian. 

Since a skew-Hermitian matrix A has the property 

* — T 

it must be that A has zeros or pure imaginary numbers on the main diagonal (why?), and that the complex 
conjugate of each entry off the main diagonal is the negative of its mirror image about the main diagonal. Here 
is an example. 

i 1-j 5" 



-5 i 0 



[ skew — Hermitian] 



Normal Matrices 

Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. For example, we 
know that real symmetric matrices are orthogonally diagonalizable and Hermitian matrices are unitarily 
diagonalizable. However, whereas the real symmetric matrices are the only orthogonally diagonalizable 
matrices, the Hermitian matrices do not constitute the entire class of unitarily diagonalizable complex matrices; 
that is, there exist unitarily diagonalizable matrices that are not Hermitian. Specifically, it can be proved that a 
square complex matrix A is unitarily diagonalizable if and only if 

AA =A A 

Matrices with this property are said to be normal. Normal matrices include the Hermitian, skew-Hermitian, 
and unitary matrices in the complex case and the symmetric, skew-symmetric, and orthogonal matrices in the 
real case. The nonzero skew- symmetric matrices are particularly interesting because they are examples of real 
matrices that are not orthogonally diagonalizable but are unitarily diagonalizable. 



A Comparison of Eigenvalues 

We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask you to show that the 
eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary (have real part of zero) and that the 
eigenvalues of unitary matrices have modulus 1. These ideas are illustrated schematically in Figure 7.5.1. 



Pure imaginary 

eigenvalues 
(skew-Hermitian) 




|A|= 1 (unitary) 



X 



Real eigenvalues 
(Hermitian) 



Figure 7.5.1 



Concept Review 

• Conjugate transpose 

• Unitary matrix 

• Hermitian matrix 

• Unitarily diagonalizable matrix 

• Skew-symmetric matrix 

• Skew-Hermitian matrix 



Find the conjugate transpose of a matrix. 

Be able to identify Hermitian matrices. 

Find the inverse of a unitary matrix. 

Find a unitary matrix that diagonahzes a Hermitian matrix. 



Exercise Set 7.5 



In Exercises 1-2, find ^ . 



Normal matrix 



Skills 



A = 



2i 1-i 
4 3 + 2 
5 + 1 0 



Answer: 



In Exercises 3-4, substitute numbers for the x's so that^ is Hermitian. 



A = 



1 i 2-3i 
X -3 1 
X X 2 



Answer: 



A = 



A = 



1 i 2-3i 
-i -3 1 
2 + 3i 1 2 

2 0 3 + 5i 
X -4 —i 
X X 6 



In Exercises 5-6, show that A is not Hermitian for any choice of the x's. 



(a) 



A = 



(b) 



A = 



Answer: 



1 i 2-3i 

-i -3 X 

2- 3i X X 

X X 3 + 5i" 

0 i -i 

3- 5i i X 



(a) ai3*'331 

(b) a22*a^ 



(a) 



i4 = 



(b) 



A = 



1 1+i X 

l+i 7 X 

6-2i X 0 

1 X 3 + 5i 

X 3 \-i 

3-5i X 2 + x 



In Exercises 7-8, verify that the eigenvalues of the Hermitian matrix^ are real and that eigenvectors from 
different eigenspaces are orthogonal (see Theorem 7.5.3). 



^-[2 + 3i -1 J 

8. . To 2il 



In Exercises 9-12, show that A is unitary, and find . 



A = 



1 

5 5 
5 5 



Answer: 



3 
5 



4 
"5 



-4i -li 



10. 



A = 



1 



_L 
/2 



11. 



4(-') 



Answer: 



2/2 2/2 

l-l-i|/'"3 -i-]f3 
2^ 2/2 



12. 



i4 = 



^(-1+0 +(1-0 



2 



In Exercises 13-18, find a unitary matrix P that diagonalizes the Hermitian matrix A, and determine p ^AP- 



Answer: 



p= 



14. 









fe 


1 


2 






r 6 2 


1 2r 


[2-21 


4 



=i;:i 



Answer: 

1 \-i 



P = 



{I {z 
4= _L 

l/6 /3 



-1; :i 



16. 



[S-i -3 



17. 



i4 = 



5 0 0 
0 -1 

0 0 



Answer: 



P = 



18. 



i4 = 



0 

f6 

1 I ; 

2 



0 1 

-II I 0 

0 



f6 

2 



-2 0 0 
0 1 0 
0 0 5 



}f2 '{2 



-^i 2 

0 



0 
2 



In Exercises 19-20, substitute numbers for the x's so thaty4 is skew-Hermitian. 
19. 



i4 = 



0 i 2-3i 
X 0 1 
XX Ai 



Answer: 



A = 



20. 



A = 



0 i 2~3i 

1 0 1 
_2-3i -1 4j 

0 0 3-5j 
X 0 -j 
XX 0 



In Exercises 21-22, show that A is not skew-Hermitian for any choice of the ^'s. 



21. 



(a) 



A = 



(b) 



A = 



Answer: 



0 j 2-3i 

-i 0 X 

2 + 3i X X 

1 X 3-5i 



X 2j 
-3 + 5j j 



— J 

3i 



(a) «13'' -«31 

(b) flu* 



22. 



(a) 



A = 



(b) 



i4 = 



i X 2-3i 

X 0 1 H-i 

2 + 3j X 

0 -i 4+7i 

X 0 > 

-4-7i X 1 



In Exercises 23-24, verify that the eigenvalues of the skew-Hermitian matrix A are pure imaginary numbers. 
23. 



24. 



'-[,1, -'•■] 



0 3i 
3i 0 



In Exercises 25-26, show that A is normal. 



25. 



A = 



26. 



A = 



1 + 21 2 + i -2-i 

2-l-j 1+i -J 
-2-i -i 1+j 

2 \ 2i i 1 — i 
j -2i 1 - 3i 

\-i i_3i _3 + 8i 



27. Show that the matrix 




is unitary for all real values of 0. [Note: See Formula 17 in Appendix B for the definition of f^'"'.] 

28. Prove that each entry on the main diagonal of a skew-Hermitian matrix is either zero or a pure imaginary 
number. 

29. Let A be any « x « matrix with complex entries, and define the matrices B and C to be 



(a) Show that B and C are Hermitian. 

(b) Show that ^ = 5 + iC and ^*=5-iC- 

(c) What condition must B and C satisfy for A to be normal? 

Answer: 

(c) B and C must commute. 

30. Show that if A is an >? > matrix with complex entries, and if u and v are vectors in C" that are expressed 
in column form, then 



35. Show that if u is a unit vector in C" that is expressed in column form, then // = / _ 2uu* is Hermitian and 
unitary. 

36. What can you say about the inverse of a matrix A that is both Hermitian and unitary? 

37. Find a 2 x 2 matrix that is both Hermitian and unitary and whose entries are not all real numbers. 




4c ^ 

i4u-v = u-j4 V and U'^v = -4 u-v 



31. Show that if ^ is a unitary matrix, then so is 

32. Show that the eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary. 

33. Show that the eigenvalues of a unitary matrix have modulus 1. 

34. Show that if u is a nonzero vector in that is expressed in column form, then p = uu* is Hermitian. 



Answer: 



1 





I 



1 





38. Under what conditions is the following matrix normal? 



i4 = 



0 0 
0 0 c 
0 i 0 



39. What geometric interpretations might you reasonably give to multiplication by the matrices P = uu* and 



40 



// = / _ 2uu in Exercises 34 and 35? 
Answer: 

Multiplication of x by P corresponds to ||u||^ times the orthogonal projection of x onto W= span {u} . If 
||u|| = 1, then multiplications of x by // = / _ 2uu* corresponds to reflection of x about the hyperplane u"" 

,-1 



Prove that if A is an invertible matrix, then ^4 * is invertible, and ^-^ J ~ ^ ) * 

''I- (a) Prove that 04)"= det(^). 

(b) Use the result in part (a) and the fact that a square matrix and its transpose have the same determinant 
to prove that det * ) = <let(^) . 

42. Use part (b) of Exercise 41 to prove: 

(a) If A is Hermitian, then det(^) is real. 

(b) If ^ is unitary, then |det(^) | = 1. 

43. Use properties of the transpose and complex conjugate to prove parts (a) and (e) of Theorem 7.5.1. 

44. Use properties of the transpose and complex conjugate to prove parts (b) and (d) of Theorem 7.5.1. 

45. Prove that an « x « matrix with complex entries is unitary if and only if the columns of A form an 
orthonormal set in C". 

46. Prove that the eigenvalues of a Hermitian matrix are real. 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer, 
(a) 



The matrix 

Answer: 

False 



0 I 

1 2 



is Hermitian. 



(b) 


i 

~f2 


i 

f6 


V5 




The matrix 


0 






is unitary. 




7? 




7? 





Answer: 



False 

(c) The conjugate transpose of a unitary matrix is unitary. 



Answer: 

True 

(d) Every unitarily diagonalizable matrix is Hermitian. 
Answer: 

False 

(e) A positive integer power of a skew-Hermitian matrix is skew-Hermitian. 
Answer: 

False 
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Chapter 7 Supplementary Exercises 



1. Verify that each matrix is orthogonal, and find its inverse. 



(a) 



(b) 



1 _i 
5 5 

4 3 

5 5 

_9_ 4 

25 5 

12. 1 
25 5 



_3 

5 

li 
"25 

li 

25 



Answer: 



(a) 



3 
5 
4 
5 



(b) 



4 
"5 
3 
5 

I » 

_2_ 1 
"25 5 

li 1 

25 5 



_3 
5 

12 
'25 

li 

25 



3 
5 
_4 

5 



4 
5 

0 
3 



_L ii 

'25 25 

4 3 

5 5 
12 li 

25 25 



2. Prove: If g is an orthogonal matrix, then each entry of Q is the same as its cofactor if det(0 = 1 and is 
the negative of its cofactor if det(0) = — 1 . 

3. Prove that if ^ is a positive definite symmetric matrix, and if u and v vectors in R'^ in column form, then 

T 

(u, v} = u Av 

is an inner product onR". 

4. Find the characteristic polynomial and the dimensions of the eigenspaces of the symmetric matrix 

"3 2 2" 
2 3 2 
2 2 3 



5. Find a matrix P that orthogonally diagonalizes 

A = 

and determine the diagonal matrix jj — p"^ j^. 



1 0 1 

0 1 0 

1 0 1 



Answer: 



P = 



0 

1 



0 0 

1 _L 

^ i/2 



0 0 0 

0 2 0 
0 0 1 



6. Express each quadratic form in the matrix notation x^iix- 

(a) -Ax\-^\6x2-\5x\X2 

(b) 9;: J — ^2 ^^3 ^^1^2 " 87: 1:^3 + X2^2 

7. Classify the quadradic form 

^1 -3x1X2 + 4x2 

as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 

Answer: 

positive definite 

8. Find an orthogonal change of variable that eliminates the cross product terms in each quadratic form, and 
express the quadratic form in terms of the new variables. 

(a) -3^1^ + 57:| + 2x1X2 

^ -J ^ 

(b) — 5xi + X2 — X3 + 6x1x3 + 4x1X2 

9. Identify the type of conic section represented by each equation. 

(a) y -7:^^ = 0 

(b) 3x- 117^ = 0 



Answer: 

(a) parabola 

(b) parabola 

10. Find a unitary matrix U that diagonalizes 



A = 



1 1 0 

0 1 1 

1 0 1 



and determine the diagonal matrix £)= U^^AU- 
11. Show that ifUisan^xn unitary matrix and 

|^l| = |^2| |^«| = 1 

then the product 



V 



zi 0 0 

0 Z2 0 



0 
0 



0 0 0 



is also unitary. 
12. Suppose that ^* = — A.. 

(a) Show that iA is Hermitian. 

(b) Show that A is unitarily diagonaUzable and has pure imaginary eigenvalues. 
Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



I CHAPTER I 

X Linear Transformations 




CHAPTER CONTENTS 



8.1. General Linear Transformations 

8.2. Isomorphism 

8.3. Compositions and Inverse Transformations 

8.4. Matrices for General Linear Transformations 

8.5. Similarity 



In Section 4.9 and Section 4.10 we studied linear transformations from P/^ to R^. In this 
chapter we will define and study linear transformations from a general vector space Fto a 
general vector space W. The results we obtain here have important applications in physics, 
engineering, and various branches of mathematics. 
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INTRODUCTION 



8.1 General Linear Transformations 

Up to now our study of linear transformations has focused on transformations from to R^. In this section we 
will turn our attention to linear transformations involving general vector spaces. We will illustrate ways in which 
such transformations arise, and we will establish a fundamental relationship between general ^-dimensional vector 
spaces and /J". 



Definitions and Terminology 

In Section 4.9 we defined a matrix transformation Tj^.R^ — » R^'' to be a mapping of the form 

in which ^ is an ^ x n matrix. We subsequently established in Theorem 4.10.2 and Theorem 4.10.3 that the matrix 
transformations are precisely the linear transformations from R^' to R^, that is, the transformations with the 
linearity properties 

7'(u + v) = 7'(u) + T(v) and T(hi)=kTin) 

We will use these two properties as the starting point for defining more general linear transformations. 

r n 

DEFINITION 1 

lfT:V —^W ^ function from a vector space K to a vector space W, then T is called a linear 
transformation from Vto Wif the following two properties hold for all vectors u and v in Vmd for all 
scalars k: 

(i) 7'(^) = ^^^(u) [Homogeneity property] 

(ii) T(\i + v) = r(u) + r(v) [Additivity property] 

In the special case where = J^, the linear transformation T is called a linear operator on the vector space 
V. 



The homogeneity and additivity properties of a linear transformation X\V can be used in combination to 
show that if and V2 are vectors in Kand k\ and k2 are any scalars, then 

7(^1 VI +^2V2) =jtir(vi) +^2^(V2) 
More generally, if vj, V2, are vectors in Fand k\, k2, are any scalars, then 

TikiYi+k2V2+ ' ' • +krVr)=k\T(Yi)+k2T(v2)+ ' ' ' +krT(Yr) (1) 



The following theorem is an analog of parts (a) and (d) of Theorem 4.9.1. 



THEOREM 8.1.1 



If 7'; is a linear transformation, then: 



(a) 7(0) = 0. 

(b) T(n — v) = ^(u) — ^(v) for all u and v in V. 



a 



Proof Let u be any vector in V. Since Ou = 0> it follows from the homogeneity property in Definition 1 that 

7'(0) = r(0u)=0T(u)=0 

which proves (a). 

We can prove part (b) by rewriting ^(u — v) as 

T(u-v) = r(u+(-l)v) 

= 7'(u) + (-l)7(v) 
= 7'(u)-T(v) 

We leave it for you to justify each step. 

Use the two parts of Theorem 8.1.1 to prove that 
T(-v)=-v 

for all V in V. 

EXAMPLE 1 Matrix Transformations M 

Because we have based the definition of a general linear transformation on the homogeneity and 
additivity properties of matrix transformations, it follows that a matrix transformation Tj[.R^ — ► R!^ is 
also a linear transformation in this more general sense with V = R^^ and W = R^. 



EXAMPLE 2 The Zero Transformation A 

Let Kand ^be any two vector spaces. The mapping X\V such that ^(v) = 0 for every v in Kis a 
linear transformation called the zero transformation. To see that Tis linear, observe that 
r(u-hv)=0, r(u)=0, T(v)=0, and 7(^)^0 

Therefore, 

r(u + v)=T(u) + 7'(v) and r(;bi)=*7'(u) 



EXAMPLES The Identity Operator < 

Let Kbe any vector space. The mapping [y ,y defined by /(v) = v is called the identity operator on 
V. We will leave it for you to verify that / is linear. 



EXAMPLE 4 Dilation and Contraction Operators A 



If F is a vector space and k is any scalar, then the mapping X.V —^V given by T'(x) = ^ is a linear 
operator on V, for if c is any scalar and if u and v are any vectors in V, then 

T{c\3l) = k{c\i) = c:(jfcu) = cT{\i) 

r(u4v)=/t(u { v)=^4^=7(u) I T(y) 

If 0 < ifc < 1 ? then T is called the contraction of F with factor k, and if k > 1 , it is called the dilation of V 
with factor k (Figure 8.1.1). 





Dilation of V 



Contraction of V 



Figure 8.1.1 



EXAMPLE 5 A Linear Transformation fronn to Pn + 1 M 

Let p = p(^x) =C{] + c\x + ' • ' + CyiX^ be a polynomial in and define the transformation 

This transformation is linear because for any scalar k and any polynomials Pi and P2 in P„ we have 
nkp) = T(kp(x)) =x(kpix)) =k(xp(x)) =kT(p) 

and 



EXAMPLE 6 A Linear Transformation Using an Inner Product M 

Let VhQ an inner product space, let vq be any fixed vector in V, and let T:V R^^ the transformation 

r(x) = (x, vq} 

that maps a vector x into its inner product with vq. This transformation is linear, for if k is any scalar, and 
if u and v are any vectors in V, then it follows from properties of inner products that 

T(hi) = (jfcu, vq} = ^(u, vq} = kT(u) 

TCu + v) = (u + V, vq} = (u, vq} + (v, vq} = ^(u) + T(v) 



EXAMPLE 7 Transformations on Matrix Spaces M 



Let Myiyi be the vector space of ^ x « matrices. In each part determine whether the transformation is 
linear. 

(a) Ti(a^ = A^ 

(b) 7'2(^) = <let(^) 

Solution 

(a) It follows from parts (b) and (d) of Theorem 1.4.8 that 

Ti (kA^ = (kA) ^ = kA^ = kTi (Aj 

so Ti is linear. 

(b) It follows from Formula 1 of Section 2.3 that 

T2 {kA^i = det[kA) = yt"det (A) = ^"72 (A) 

Thus, T2 is not homogeneous and hence not linear if « > 1 . Note that additivity also fails 
because we showed in Example 1 of Section 2.3 that det(A + B) and det(^) I det(5) are not 
generally equal. 



EXAMPLE 8 Translation Is Not Linear M 

Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property is useful for 
identifying transformations that are not linear. For example, if xq is a fixed nonzero vector in f^^, then 
the transformation 

7(x) = X + XQ 

has the geometric effect of translating each point x in a direction parallel to xq through a distance of 
IIxqII (Figure 8.1.2). This cannot be a linear transformation since ^(O) = xq, so Tdoes not map 0 to 0. 




Figure 8.1.2 T(x) = x + xq translates each point x along a line parallel to xq through a distance 
llxoll- 



EXAMPLES The Evaluation Transformation M 



Let Fbe a subspace of F ( — cx), ao) , let 

^ 1» 

be distinct real numbers, and let T.V BP be the transformation 



= (/(^1),/(X2)....,/(X„)) (2) 

that associates with / the w-tuple of function values at ttj, X2, - We call this the evaluation 
transformation on Fat ;^ i, a"2, i^^- Thus, for example, if 

= - 1, 7i2 = 2, X2 = A 

and if f {x)=x then 

TU) = (f(xi),f(x2),/(x3)) = (0. 3, 15) 

The evaluation transformation in 2 is linear, for if k is any scalar, and if f and g are any functions in V, 
then 

W) = ((^/)(:fl),(*/)(^2) ikfXx„)) 

= (^/(xi),^/(x2).-.-t/(x„)) 

= k(f(xi),/(x2) /(x„))=kTif) 

and 

nZ+g) = ((/+g)(xi),(/+g)(x2) (/+g)(x„)) 

= (/ 1 ) + g(x 1 ) , / (X2) + g(x2) / (^«) + g(x„) ) 

= (/(^l). /(Jf2). /(^m)) + (g(^l), g(^2). g(^«)) 
= ?'(/)+ ^(g) 



Finding Linear Transformations from Images of Basis Vectors 

We saw in Formula (12) of Section 4.9 that if T.R" R"' is a matrix transformation, say multiplication by A, and 

if ei, 62 e„ are the standard basis vectors for then^ can be expressed as 

^=[T(ei)|7'(e2)|- • • \TiB„)] 

It follows from this that the image of any vector v = (cj, C2, c„) in under multiplication by A can be 
expressed as 

Tiv)=ClT(Bl)+ €27(62) + • • • +c„T(b„) 

This formula tells us that for a matrix transformation the image of any vector is expressible as a linear combination 
of the images of the standard basis vectors. This is a special case of the following more general result. 

THEOREM 8.1.2 

Let 7': — ♦ be a linear transformation, where Vis finite dimensional. lfS= {v\, V2, v„} is a basis 



for V, then the image of any vector v in Kcan be expressed as 



□ 



7(v) =ci7(vi) +€27(^2) + • • • +^«?'(v„) (3) 

where cj, C2, - are the coefficients required to express v as a hnear combination of the vectors in S. 

n 

Proof Express v as v = civj + C2V2 4- • • • -|= c„v„ and use the linearity of T. 

EXAMPLE 10 Computing with Images of Basis Vectors M 

Consider the basis = {vi,V2, V3} for where 

VI = (1,1,1), V2= (1,1,0), V3 = (1.0.0) 
Let T.B? - ^ be the hnear transformation for which 

T(vi) = (1, 0). T(V2) = (2, - 1), r(v3) = (4. 3) 
Find a formula for T(x 1 , ^2^ ^3) ? then use that formula to compute T(2, —3,5). 

Solution We first need to express x = (7:1, X2, ^3) as a linear combination of v^, V2, and V3. If we 

write 

(XI. X2,X2) =ci(l, 1, 1) +C2(h 1. 0) +C3(1. 0. 0) 

then on equating corresponding components, we obtain 

ci-hC2-\-C2 = X{ 

CI+C2 = X2 

ci = X2 

which yields c 1 = 7:3, C2 = :t 2 " ^3, ^3 = ^ 1 " ^2, so 

(7:i,7:2,;^3) = ;C3(1, 1, 1) + (:^2 -^3)(1, 1, 0) + (xi -7:2)(1, 0, 0) 

= X2VI + (:ir2-^3)v2 + -:^2)V3 

Thus 

T(xuX2.X2) = X2T(vi) -f- (:^2-^3)7'(v2) + (^l-^2)7'(v3) 
= ;.3(l,0) + (x2-:^3)(2, - 1) + -;.2)(4, 3) 
= (4x1 -27:2- :^3. 37:1 -4x2 + 7:3) 

From this formula, we obtain 

7(2, -3, 5) = (9, 23) 

CALCULUS REQUIRED 

EXAMPLE 11 A Linear Transformation from c\-°o, to F(-°o, °o) ^ 

Let f^ = C^^ — cx), oojbe the vector space of functions with continuous first derivatives on ( — cx), cx)), and let 

W= F( — yz', oc) be the vector space of all real- valued functions defined on ( — 00, oc) . Let £): P'' — ► be the 
transformation that maps a function { = f (x) into its derivative — that is. 



From the properties of differentiation, we have 

D(f + g)=Z)(Af)=ytZ)(f) and Z)(f)+Z)(g) 
Thus, £) is a linear transformation. 

CALCULUS REQUIRED 

EXAMPLE 12 An Integral Transformation A 

Let f '^ = C( — cx), cx)) be the vector space of continuous functions on the interval ( — CX5, cxd) , let 
W=C^{^'^QO, cx)j be the vector space of functions with continuous first derivatives on ( — CX5, CX5), and 

let J'V —^W^^ the transformation that maps a function /in Finto 



2 

For example, if / (7:) = x , then 



3 "1^ 3 



JO 

The transformation J-y —^l^ is linear, for if k is any constant, and if f and g are any functions in V, then 
properties of the integral imply that 

j(kf) = £kf {t)dt = k£/ {t)dt = kju) 

JU +s)= f (/ (0 + g(0)dt = ff {t)dt + fg{t)dt = J(f) + J(g) 
Jo JO Jo 



Kernel and Range 

Recall that if ^ is an ^ x ?2 matrix, then the null space of A consists of all vectors xinR^ such that Ax. = 0'> and by 
Theorem 4.7.1 the column space of ^ consists of all vectors b in R^' for which there is at least one vector x in 
such that Ax. = h- From the viewpoint of matrix transformations, the null space of A consists of all vectors in 
that multiplication by A maps into 0, and the column space of A consists of all vectors in that are images of at 
least one vector inR^ under multiplication by A. The following definition extends these ideas to general linear 
transformations. 

r n 



DEFINITION 2 



liT:V —^W linear transformation, then the set of vectors in Kthat Tmaps into 0 is called the kernel of 
Tand is denoted by ker(^). The set of all vectors in Wthat are images under Tof at least one vector in Vis 
called the range of T and is denoted by R(t). 



L 



J 



EXAMPLE 1 3 Kernel and Range of a Matrix Transformation M 

If Tj^.R^ — ► BJ^ is multiplication by the ^ x « matrix A, then, as discussed above, the kernel of Tj\ is 
the null space of A, and the range of Tj\ is the column space of A. 



EXAMPLE 14 Kernel and Range of the Zero Transformation A 

Let 7'; _» be the zero transformation. Since Tmaps every vector in Finto 0, it follows that 
kerCO = V' Moreover, since 0 is the only image under T of vectors in V, it follows that R{t) = {0} . 



EXAMPLE 15 Kernel and Range of the Identity Operator A 

Let l\Y be the identity operator. Since /(v) = v for all vectors in F, every vector in Kis the image 
of some vector (namely, itself); thus R{1) = V. Since the only vector that /maps into 0 is 0, it follows 
thatker(/) = (0) . 



EXAMPLE 1 6 Kernel and Range of an Orthogonal Projection ^ 



As illustrated in Figure 8.1.3a, the points that Tmaps into 0 = (0, 0, 0) are precisely those on the z-axis, 
so ker(^) is the set of points of the form (0, 0, z). As illustrated in Figure 8.1.3Z7, Tmaps the points in p^^ 
to the xy-plane, where each point in that plane is the image of each point on the vertical line above it. 
Thus, R{t) is the set of points of the form (x, y, 0)- 



4^ 



(o,a:) 



(0. 0, 0) 



T 

(V y, 0) 

(a) ker(r) is the c-axis. {b) R(T) is the entire .vv-plane. 
Figure 8.1.3 



EXAMPLE 17 Kernel and Range of a Rotation M 

Let T: J?^ — ► i?^ be the linear operator that rotates each vector in the xy-plane through the angle ff (Figure 
8.1.4). Since every vector in the xy-plane can be obtained by rotating some vector through the angle ff, it 
follows that R(t) = R . Moreover, the only vector that rotates into 0 is 0, so ker(^) = {0} . 




CALCULUS REQUIRED 



EXAMPLE 1 8 Kernel of a Differentiation Transfornnation M 

Let F = ^ — CX5, coj be the vector space of functions with continuous first derivatives on ( — cx), od), 

lQtW = F{^OQ, go) be the vector space of all real- valued functions defined on ( — oo, oo), and let 

be the differentiation transformation D^f J = f\x). The kernel of D is the set of functions in 
Fwith derivative zero. From calculus, this is the set of constant functions on ( — £X), cx)). 



Properties of Kernel and Range 

In all of the preceding examples, ker(^) and R(t) turned out to be subspaces. In Example 14, Example 15, and 
Example 17 they were either the zero subspace or the entire vector space. In Example 16 the kernel was a line 
through the origin, and the range was a plane through the origin, both of which are subspaces of All of this is a 
consequence of the following general theorem. 

□ 



THEOREM 8.1.3 

^^T.V —^W ^ linear transformation, then: 

(a) The kernel of T is a subspace of V. 

(b) The range of T is a subspace of W. 



ill 



Proof (a) To show that ker(^ ) is a subspace, we must show that it contains at least one vector and is closed under 
addition and scalar multiplication. By part {a) of Theorem 8.1.1, the vector 0 is in ker(0, so the kernel contains at 
least one vector. Let and V2 be vectors in ker(0 , and let k be any scalar. Then 

7(vi + V2) = r(vi) + T{y2) =0 + 0 = 0 

so VI + V2 is in ker(/) . Also, 

T(ytvi)=yt7(vi)=iO = 0 



soifcvi inker(0. 

Proof (b) To show that R(t) is a subspace of W, we must show that it contains at least one vector and is closed 
under addition and scalar multiplication. However, it contains at least the zero vector of ^ since ^(O) = (0) by 
part (a) of Theorem 8.1.1. To prove that it is closed under addition and scalar multiplication, we must show that if 
wi and W2 are vectors in R(t), and if k is any scalar, then there exist vectors a and b in Vfov which 



^(a) =wi +W2 and T(h)=kw\ (4) 

But the fact and W2 are in R(i") tells us that there exist vectors vi and V2 in Fsuch that 

T(vi ) = wi and T(v2) = W2 

The following computations complete the proof by showing that the vectors a = vi h V2 and h = kvi satisfy the 
equations in 4: 

r(a) = 7(vi + V2) = T(y\) + r(v2) =wi +W2 
T(h) = T(kvi)=kT(vi) =kwi 

CALCULUS REQUIRED 

EXAMPLE 19 Application to Differential Equations M 

Differential equations of the form 

y " + ti9^y = 0 (u^a positive constant J (5) 

arise in the study of vibrations. The set of all solutions of this equation on the interval ( — CX), go) is the 
kernel of the linear transformation ^ — cx), €X5j — ► — CX5, cx)j^ given by 

It is proved in standard textbooks on differential equations that the kernel is a two-dimensional subspace 
of ^ — oo, ooj^ so that if we can find two linearly independent solutions of 5, then all other solutions 

can be expressed as linear combinations of those two. We leave it for you to confirm by differentiating 
that 

y\=cosid)x and y2 = sinu}x 

are solutions of 5. These functions are linearly independent since neither is a scalar multiple of the other, 
and thus 

y = c\cos + C2sm u}x (6) 



is a "general solution" of 5 in the sense that every choice of ci and C2 produces a solution, and every 
solution is of this form. 



Rank and Nullity of Linear Transformations 



In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an^xn matrix, and in Theorem 4.8.2, 
which we called the Dimension Theorem, we proved that the sum of the rank and nullity is n. We will show next 
that this result is a special case of a more general result about linear transformations. We start with the following 
definition. 



r 



DEFINITION 3 

Let 7': — > be a linear transformation. If the range of T is finite-dimensional, then its dimension is called 
the rank of T; and if the kernel of T is finite-dimensional, then its dimension is called the nullity of T. The 
rank of T is denoted by rank(^) and the nullity of Thy nullity (^) . 



The following theorem, whose proof is optional, generalizes Theorem 4.8.2. 

u 

THEOREM 8.1 .4 Dimension Theorem for Linear Transformations 

lfT:V —^W^^^ linear transformation from an ^-dimensional vector space Fto a vector space W, then 

raiik(0 + nullity(0 = « (7) 



In the special case where ^ is an ^ x « matrix and Tj^.R^ — ♦ is multiplication by A, the kernel of Tj{ is the null 
space of A, and the range of Tj\ is the column space of A. Thus, it follows from Theorem 8. 1 .4 that 

rank(7'^ + nullity (7^ =« 

OPTIONAL 
Proof of Theorem 8. 1.4 We must show that 

dim(i?(0) + dim(ker(0) =« 

We will give the proof for the case where 1 < dim(ker(^)) < n. The cases where dim(ker(^)) = 0 and 
dim(ker(0) = « are left as exercises. Assume dim(ker(0) = ^, and let vi, be a basis for the kernel. Since 
{vi, v^} is linearly independent. Theorem 4.5.5b states that there are ^ _ ^ vectors, v^_^i, v„, such that the 
extended set (vi, v^, v^+i, v„} is a basis for V. To complete the proof, we will show that the ^ _ ^ vectors 
in the set S' = { 7'(vy_|_i ),..., 7'(v„) } form a basis for the range of T. It will then follow that 

dim(jR(0) + dini(ker(0) = + r = « 

First we show that S spans the range of T. If b is any vector in the range of T, then b = T(v) for some vector v in 
V. Since ( vi , . . ., v^, v^_|_i , . . v„ } is a basis for K, the vector v can be written in the form 

v = «:ivi+ • • • -^CrVr-hCr+iVr+l-^ ' ' ' H-<^mV„ 
Since vj, lie in the kernel of T, we have ^(vi) = ■ ■ • = T^Vy) = 0, so 



Thus S spans the range of T. 



Finally, we show that 5* is a linearly independent set and consequently forms a basis for the range of T. Suppose that 
some linear combination of the vectors in S is zero; that is, 

kr+iT(Vr+\) + • ■ • +knT(Yy,) =0 (8) 

We must show that kj^^i = • • • = jt„ = 0. Since Tis linear, 8 can be rewritten as 

which says that kr-^\Vj.^\ + • • • + it„v„ is in the kernel of T. This vector can therefore be written as a linear 
combination of the basis vectors {vj, v^} , say 

*^r+lVr+l+ ' • ■ +kyiVn = kiVi+ • • • +*,Vy 

Thus, 

kiYi+ • • • +krVr-kr+\Vr+\~ ' ' ' -A:„v„=0 

Since (vi,..., v„} is linearly independent, all of the A:'s are zero; in particular, = ■ • • =^„ = 0, which 
completes the proof 



Concept Review 

• Linear transformation 

• Linear operator 

• Zero transformation 

• Identity operator 

• Contraction 

• Dilation 

• Evaluation transformation 

• Kernel 

• Range 

• Rank 

• Nullity 

Skills 

• Determine whether a function is a linear transformation. 

• Find a formula for a linear transformation X:V —^W given the values of T on a basis for V. 

• Find a basis for the kernel of a linear transformation. 

• Find a basis for the range of a linear transformation. 

• Find the rank of a linear transformation. 

• Find the nullity of a linear transformation. 



Exercise Set 8.1 



In Exercises 1-8, determine whether the function is a linear transformation. Justify your answer. 
l» T:V where V is an inner product space, and ^(u) = ||u|| . 

Answer: 

Nonlinear 

2» T.R^ — ► R^^ where vq is a fixed vector in /J^ and ^(u) = u x vg. 

3. T: M22 A/23' where ^ is a fixed 2x3 matrix and T(A) = AB. 
Answer: 

Linear 

4. T: ^ R. where T(A) = tr(A) . 
5-F:M^„-il/„^,where^(^) = ^^. 

Answer: 

Linear 
6. T: M22-»i?, where 

\ = 3a-4b + c-d 



(a) 
(b) J,/ 



a b 
c d 



a b 
c d 



\=<?^b'^ 



7. 7:^2 ^2' where 

(a) T[a^ + aix + 02^^) = <ato + + 1 ) + a2(^ + 1)^ 

(b) T(ao + 

Answer: 

(a) Linear 

(b) Nonlinear 

8. T:i?'( — cx), cx)) — > — oo, oo), where 

(a) ?'(/(;.)) = 

(b) r(/(x))=/(x+l) 

9. Consider the basis ^ = (vi, V2} for /J^, where = (1, 1) and V2 = (1, 0), and let T'R^ J?^ be the linear 
operator for which 

T(vi) = (l, -2) and T(v2) = (-4,1) 

Find a formula for T(x 1 , ^^2) ^ ^^e that formula to find T(5, — 3) . 



Answer: 

T(xux2) (-4x1 f 5x2. ^\-3x2), 7(5, -3) = (-35. 14) 

10. Consider the basis S= (vj, V2} for g^, where vi = ( — 2, 1) and V2 = (1, 3), and let T:R^ —^S? be the 
linear transformation such that 

^(vi) = ( - 1, 2. 0) and T(v2) = (0. - 3, 5) 
Find a formula for T(x i , 2) ? ^se that formula to find T(2, — 3) . 

11. Consider the basis S= (vi, V2, V3} for where vi = (1, 1, 1), V2 = (1, 1, 0), and V3 = (1, 0, 0), and let 
T:S? — ► be the linear operator for which 

TCvi) = (2.-1,4), r(v2) = (3. 0. 1). 

r(v3) = (- 1,5,1) 

Find a formula for T(xi, X2, ^3), and use that formula to find T(2, 4, — 1). 
Answer: 

Tixu X2, xi) = XI + 4x2- X3, 5x1^ 5x2 -X3, :^:i + 37r3); 7^(2,4, -1) = (15, -9,-1) 

12. Consider the basis iS'= {vi, V2, V3} for R^, where vi = (1, 2, 1), V2 = (2, 9, 0), and V3 = (3, 3, 4), and let 
T:S? be the linear transformation for which 

T-Cvi) = (1. 0), r(v2) = (-l.l). 7'(y3) = (0.1) 
Find a formula for T(xi, j:2, ^3), and use that formula to find T(7, 13, 7). 

13. Let VI, V2, and V3 be vectors in a vector space V, and let 7'; JiJ^ be a linear transformation for which 

T(jO = (1.-1.2). r(v2) = (0.3.2). 
r(v3) = (-3. 1.2) 

Find r(2vi - 3v2 + 4y3) . 

Answer: 

T(2vi - 3v2 I 4v3) = ( - 10, -7, 6) 

14. Let T:R^ — ♦ be the linear operator given by the formula 

Tix,y) = i2x-y, -Bx+Ay) 

Which of the following vectors are in /?(i)? 

(a) 0, -4) 

(b) (5, 0) 

(c) (-3,12) 

15. Let T:R^ — ► be the linear operator in Exercise 14. Which of the following vectors are in ker(^)? 

(a) (5. 10) 

(b) (3.2) 

(c) (1. 1) 

Answer: 



(a) 



16. Let T\R — ► J? be the linear transformation given by the formula 

T{x 1 , X2, x^i = (4x1-^X2- 2x2 - 3^:4. 

2x1 + ;r2 + ^3 — ^^4^ 6^1 — 9^3 + 9^4) 

Which of the following are in R(t)7 

(a) (0. 0. 6) 

(b) (1.3,0) 

(c) (2.4.1) 

17. Let T:R^ — ► J?"^ be the linear transformation in Exercise 16. Which of the following are in ker(i)? 

(a) (3.-8,2,0) 

(b) (0. 0. 0. 1) 

(c) (0. -4.1.0) 

Answer: 

(a) 

18. Let T:P2—*P2 linear transformation defined by T(j?(x)) =xp(x). Which of the following are in 
ker(/)? 

(a) 

(b) 0 
(c) 

19. Let T.P2 — » P3 be the linear transformation in Exercise 18. Which of the following are in /Z(/)? 
(a) x + x^ 

(b) 

(c) 3-x^ 

Answer: 

(a) 

20. Find a basis for the kernel of 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

21. Find a basis for the range of 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

Answer: 



(a) (1. -4) 

(b) (4,2,6), (1,1,0), (-3,-4,9) 



(c) i.xlx^ 



22. Verify Formula 7 of the dimension theorem for 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 

(c) the linear transformation in Exercise 18. 

In Exercises 23-26, let The multiplication by the matrix v4. Find 

(a) a basis for the range of T. 

(b) a basis for the kernel of T. 

(c) the rank and nullity of T. 

(d) the rank and nullity of v4. 



23. 



i4 = 



1 -1 

5 6 
7 4 



Answer: 



(a) 



(b) 



1 

5 
7 

-14 
19 
11 





'-V 




6 




4 



(c) Rank(T) = 2, nullity(T) = 1 

(d) Rank(^) = 2, nullity(j4) = 1 



24. 



25. 



2 0-1 

A= 4 0-2 

20 0 0 

[=P» 1 5 21 
[l 2 3 Oj 



Answer: 



(a) 



(b) 










-1 








1 




0 




0 




7 



(c) Rank (7^ = nullity (7) = 2 

(d) Rank {A) = mi%(^) = 2 



26. 



i4 = 



1 4 
3 -2 



5 0 9 

1 0 -1 

•1 0 -1 

5 1 8 



-1 

2 



0 - 

3 



27. Describe the kernel and range of 

(a) the orthogonal projection on the ;:2'-plane. 

(b) the orthogonal projection on the yz-plane. 

(c) the orthogonal projection on the plane defined by the equation y = 



Answer: 



(a) Kernel: j^-axis; range: xz-plane 

(b) Kernel: x-axis; range: jz-plane 

(c) Kernel: the line through the origin perpendicular to the plane y = range: plane y = x 

28. Let VhQ any vector space, and let 7': — ► f'^ be defined by ^(v) = 3v. 

(a) What is the kernel of 77 

(b) What is the range of 77 

29. In each part, use the given information to find the nullity of the linear transformation T. 

(a) T:R^ — ► has rank 3. 

(b) T:P4 P2 has rank 1 . 

(c) The range of T:R^ — ^ R^ is R^. 

(d) T: M22 ^1^22 has rank 3. 



(a) Nullity(T) = 2 

(b) Nullity(T)=4 

(c) Nullity(T) = 3 

(d) Nullity(T) = 1 

30. Let v4 be a 7 X 6 matrix such that ^ = 0 has only the trivial solution, and let T:R^ — ► J?"^ he multiplication by 
A. Find the rank and nullity of T. 

31. Let v4 be a 5 X 7 matrix with rank 4. 

(a) What is the dimension of the solution space of ^ = 0? 

(b) Is ^ = b consistent for all vectors b in Explain. 



32. Let X:R —¥W^^ ^ linear transformation from to any vector space. Give a geometric description of ker(i)- 



Answer: 



Answer: 



(a) 3 

(b) No 



33. Let 7': f j!?-^ be a linear transformation from any vector space to g^. Give a geometric description of R(t) . 
Answer: 

A line through the origin, a plane through the origin, the origin only, or all of 

34. Let T:S? — ► be multiplication by 

'13 4" 
3 4 7 
-2 2 0_ 

(a) Show that the kernel of T is a line through the origin, and find parametric equations for it. 

(b) Show that the range of T is a plane through the origin, and find an equation for it. 

(a) Show that i{^l,^2,bi, and 62 ^^e any scalars, then the formula 

F (x, y) = (aix + biy, a2X + b^y) 

defines a linear operator on g^. 

(b) Does the formula F j^jr, J = ia\x^ + biy^, a2X^ + i2y^ J define a linear operator on Explain. 

Answer: 

(b) No 

36. Let {vi, V2 v^} be a basis for a vector space F, and let 7'; _^ be a linear transformation. Show that if 

T(yi) = T(y2)= T(y„)=0 

then T is the zero transformation. 

37. Let {vi, V2 v^} be a basis for a vector space F, and let 7*: — ^ be a linear operator. Show that if 

^'(vi ) = VI, T(y2) = V2 T(Yy^) = v„ 

then T is the identity transformation on V. 

38. For a positive integer > ] , let T: Mvnj • R be the linear transformation defined by T(^A) = tr(^), where A is 
an yj X .>2 matrix with real entries. Determine the dimension of k_er(0 • 

39. Prove: If (vj, V2, v^} is a basis for Fand wi, W2, are vectors in W, not necessarily distinct, then 
there exists a linear transformation J'-y such that 

7'(vi)=wi, r(v2)=W2,... 7'(v„)=w„ 

40. (Calculus required) Let V=C[a,b] be the vector space of functions continuous on i>] , and let X:V —^V 
be the transformation defined by 

Is r a linear operator? 

41. (Calculus required) Let D:P2 — ► P2 be the differentiation transformation ^(p) = p\x) . What is the kernel of 
D7 

Answer: 



ker(Z)) consists of all constant polynomials. 



' (Calculus required) Let J-.P^—^ R be the integration transformation 




p(x)dx- What is the kernel 



of J? 

43. (Calculus required) Let Fbe the vector space of real-valued functions with continuous derivatives of all orders 
on the interval ( — cv, x.) , and iQtW = F( — 00, oo) be the vector space of real-valued functions defined on 

(-CX), QO). 

(a) Find a linear transformation 7' pf^ whose kernel is P3. 

(b) Find a linear transformation 7'; whose kernel is P„. 

Answer: 

(a) T(f(x))=f<-'^(x) 

(b) 7'(/(x))=/<"+l)(x) 

44. If ^ is an ^ X « matrix, and if the linear system Ax = b is consistent for every vector hinR^, what can you 
say about the range of 7^:^^" R^l 

True-False Exercises 

In parts (a)-(i) determine whether the statement is true or false, and justify your answer. 

(a) lfT(ciY\ +02^2) +C2T(v2) for all vectors andV2 in F and all scalarsci and C2? then T is a 
linear transformation. 

Answer: 

True 

(b) If V is a nonzero vector in F, then there is exactly one linear transformation XV —^W such that 
T(-v)=-T(y). 

Answer: 

False 

(c) There is exactly one linear transformation 7'; — » for which ^(u + v) = ^(u — v) for all vectors u and v in 
V. 

Answer: 

True 

(d) If VQ is a nonzero vector in V, then the formula ^(v) = vq + v defines a linear operator on V. 
Answer: 

False 

(e) The kernel of a linear transformation is a vector space. 
Answer: 



True 



(f) The range of a linear transformation is a vector space. 
Answer: 

True 

(g) If T.P^ — ► M21 is a linear transformation, then the nullity of T is 3. 
Answer: 

False 

(h) The function T: M22 ^ defined by T(A) = det ^ is a linear transformation. 
Answer: 
False 

(i) The linear transformation T: M22 

has rank 1 . 
Answer: 

False 
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— ♦ M22 defined by 

"1 3" 



nA) = 



2 6 



8.2 Isomorphism 

In this section we will establish a fundamental connection between real fmite-dimensional vector spaces and the Euclidean 
space This connection is not only important theoretically, but it has practical applications in that it allows us to perform 
vector computations in general vector spaces by working with the vectors in 



One-to-One and Onto 



Although many of the theorems in this text have been concerned exclusively with the vector space R^'\ this is not as limiting 
as it might seem. As we will show, the vector space is the "mother" of all real ^-dimensional vector spaces in the sense 
that any such space might differ from in the notation used to represent vectors, but not in its algebraic structure. To 
explain what we mean by this, we will need two definitions, the first of which is a generalization of Definition 1 in Section 
4.10. (See Figure 8.2.1). 

r n 



DEFINITION 1 

If 7*; _^ is a linear transformation from a vector space Fto a vector space W, then Tis said to be one-to-one if 
T maps distinct vectors in V into distinct vectors in W. 

L J 

r n 



DEFINITION 2 

If 7*; is a linear transformation from a vector space Fto a vector space W, then Tis said to be onto (or onto 

W) if every vector in W is the image of at least one vector in V. 



W 



Range 
of r 



Range 

of r 



One-to-one. Distinct 
vectors in Vhave 
distinct images in W. 



Not one-to-one. There 
exist distinct vectors in 
Vwith the same Image. 



Onto W. Every vector in 
w Is the image of some 
vector In V. 



Not onto U'. Not every 
vector in H' is the image 
of some vector In V. 



Figure 8.2.1 



The following theorem provides a useful way of telling whether a linear transformation is one-to-one by examining its 
kernel. 



THEOREM 8.2.1 

If T'; is a linear transformation, then the following statements are equivalent. 



(a) r is one-to-one. 

(b) kerCO = {0} . 



Proof (a) ^ (b) Since T is linear, we know that ^(O) = 0 by Theorem 8.1.1a. Since T is one-to-one, there can be no 
other vectors in Fthat map into 0, so k:er(^) = {0} . 

(b) =^ (a) Assume that k:er(^) = (0) . If u and v are distinct vectors in F, then u — v 0- This implies that ^(u — v) 0, 
for otherwise ker(^) would contain a nonzero vector. Since Tis linear, it follows that 



so Tmaps distinct vectors in Kinto distinct vectors in Wmid hence is one-to-one. 

In the special case where V is fmite-dimensional and T is a linear operator on V, then we can add a third statement to those 
in Theorem 8.2.1. 



THEOREM 8.2.2 

If F is a fmite-dimensional vector space, and if 7'; > ^ is a linear operator, then the following statements are 
equivalent. 

(a) r is one-to-one. 



Proof We already know that (a) and (b) are equivalent by Theorem 8.2.1, so it suffices to show that (b) and (c) are 
equivalent. We leave it for you to do this by assuming that dim(f^) = n and applying Theorem 8.1.4. 

EXAMPLE 1 Dilations and Contractions Are One-to-One and Onto ^ 

Show that if F is a fmite-dimensional vector space and c is any nonzero scalar, then the linear operator 
J'Y —^ir defined by ^(v) = cy is one-to-one and onto. 

Solution The operator Tis onto (and hence one-to-one) for if v is any vector in Fthen that vector is the 
image of the vector (1 f c)v. 



EXAMPLE 2 Matrix Operators < 

If Tj^.R^ ^ ^" is the matrix operator Tj{{ii) = Ax:, then it follows from parts (r) and (s) of Theorem 5.1.6 that 
Tj^ is one-to-one and onto if and only if A is invertible. 



7(u)-7(v) = 7(u-v)^0 



B 



(b) 

(c) 



ker(i) = (0) . 

ris onto [i.Q.,R{t) = V] 



EXAMPLES Shifting Operators < 



Let V = be the sequence space discussed in Example 3 of Section 4.1, and consider the linear "shifting 
operators" on F defined by 

Ti (2^1, U2. Uyi, ...) = (0, uu U2, .--) 
72(^1, 2^2. ---) = ("2, "3, ---) 

(a) Show that Ti is one-to-one but not onto. 

(b) Show that T2 is onto but not one to one. 

Solution 

(a) The operator Ti is one-to-one because distinct sequences in obviously have distinct images. This 
operator is not onto because no vector inR^ maps into the sequence ( 1 , 0, 0, . . 0, . . .) , for example. 

(b) The operator T2 is not one-to-one because, for example, the vectors ( 1 , 0, 0, . . 0, . . .) and 

(2, 0, 0, 0, ...) both map into (0, 0, 0, 0, ...). This operator is onto because every possible 
sequence of real numbers can be obtained with an appropriate choice of the numbers U2, 2^3, Uy^, .... 



Why does Example 3 not violate Theorem 8.2.2? 

EXAMPLE 4 Basic Transformations That Are One-to-One and Onto M 

The linear transformations T1.P2 ^ R^ T2 : M22 —^R^ defined by 



T2 



a b 

c d 



= \a.b,c,d\ 



are both one-to-one and onto (verify by showing that their kernels contain only the zero vector). 

EXAMPLE 5 A One-to-One Linear Transformation M 

Let T.Pyi^ Pn-\-\ be the linear transformation 

discussed in Example 5 of Section 8.1. If 

^=ip(x)=c\^^c\x^ ' ' ' -hCyiX^ and q = q(x) =d{)+dix + • ■ ■ +dnX^ 
are distinct polynomials, then they differ in at least one coefficient. Thus, 

t(p)=cox + ci7:^+ • • • and T{q^=df)X ^d^x^ ^ • • • 4^rf„7:"+^ 

also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct polynomials p and q 
into distinct polynomials ^(p) and ^(q). 

CALCULUS REQUIRED 

EXAMPLE 6 A Transformation That Is Not One-to-One M 

Let 



be the differentiation transformation discussed in Example 11 of Section 8.1. This linear transformation is not 
one-to-one because it maps functions that differ by a constant into the same function. For example, 



Dimension and Linear Transformations 



In the exercises we will ask you to prove the following two important facts about a linear transformation T\V —^W^^ the 
case where Fand ^are finite-dimensional: 

1. If dim(f?0 < dim(f^), then T cannot be one-to-one. 

2. If dim(f'^) < <3i\m{W), then T cannot be onto. 

Stated informally, if a linear transformation maps a "bigger" space to a "smaller" space, then some points in the "bigger" 
space must have the same image; and if a linear transformation maps a "smaller" space to a "bigger" space, then there must 
be points in the "bigger" space that are not images of any points in the "smaller" space. 

Remark These observations tell us, for example, that any linear transformation from to p} must map some distinct 
points of into the same point in /J^, and it also tells us that there is no linear transformation that maps onto all oi p^. 



Isomorphism 

Our next definition paves the way for the main result in this section. 



DEFINITION 3 



If a linear transformation 7'; Y - W is both one-to-one and onto, then T is said to be an isomorphism, and the 
vector spaces V and W are said to be isomorphic. 



The word isomorphic is derived from the Greek words iso, meaning "identical," and morphe, meaning "form." This 
terminology is appropriate because, as we will now explain, isomorphic vector spaces have the same "algebraic form," even 
though they may consist of different kinds of objects. To illustrate this idea, examine Table 1 in which we have shown how 
the isomorphism 

matches up vector operations in P2 and p^. 



Table 1 



Operation in P2 


Operation in R3 


3(\-2x 1 3x2) = 3-6x + 9x2 


3(1, -2, 3) = (3, -6, 9) 




(2, 1, -1) + (1, -1,5) -(3, 0,4) 



Operation in Pi 


Operation in Rj, 


J4 + 2x + 3x^1 - (2 - 4x -1- 3;r^) = 2 + 6x 


(4, 2, 3) -(2, -4, 3) = (2, 6,0) 



The following theorem, which is one of the most important results in linear algebra, reveals the fundamental importance of 
the vector space 

THEOREM 8.2.3 

Every real ^-dimensional vector space is isomorphic to /J". 

n n 

Theorem 8.2.3 tells us that a real ^-dimensional vector 
space may differ from in notation, but its algebraic 
structure will be the same. 

Proof Let Fbe a real ^-dimensional vector space. To prove that Fis isomorphic to we must fmd a linear 
transformation T.V —^R^ that is one-to-one and onto. For this purpose, let 

VI, V2,..-, v„ 

be any basis for F, let 

u = jfci VI H= A:2V2 + • • • +jt„v„ (1) 

be the representation of a vector u in F as a linear combination of the basis vectors, and define the transformation 
T.V .^"by 

7'(u) = (ii,i2,-,*«) (2) 

We will show that Tis an isomorphism (linear, one-to-one, and onto). To prove the linearity, let u and v be vectors in F, let 
c be a scalar, and let 

u = jfcivi +A:2V2 + • • • +^«v„ and v = t^ivi -f ^2^2 + ' " " +'^mVm (3) 

be the representations of u and v as linear combinations of the basis vectors. Then it follows from 1 that 

T{cvl) = T(ckivi -h ck 2^2 + " " ' + <^^«v„) 
= (ck\,ck2,.-.,ckyj) 
= c(k\,k2,..:ky^) =cT(n) 

and it follows from 2 that 

7(u + v) = T((ii+rfi)vi + (i2 + ^2)v2+ • • • +(*«+^«)v„) 
= (ki -^di,k2'=^d2,--.,kyi~\-dyi) 
= (ku k2. kyi) -h (du ^2, ^n) 
= T(u) + 7(v) 

which shows that T is linear. To show that T is one-to-one, we must show that if u and v are distinct vectors in F, then so are 
their images inR^. But if u v? and if the representations of these vectors in terms of the basis vectors are as in 3, then we 



must have kj ^ dj for at least one /. Thus, 

r(u) = (ki. k2 k„) * (du d2 dy,) = T(v) 

which shows that u and v have distinct images under T. Finally, the transformation T is onto, for if 

w= (k\, A:2, 

is any vector in then it follows from 2 that w is the image under Tof the vector 

u = kiv\ + A:2V2 + • • • + A:„v„ 

Remark Note that the isomorphism T in Formula 2 of the foregoing proof is the coordinate map 

qX (A:i,A:2,-.-, = (u)^ 

that maps u into its coordinate vector with respect to the basis S = {vj, V2, - - v„ } . Since there are generally many 
possible bases for a given vector space V, there are generally many possible isomorphisms between Fand R^, one for each 
different basis. 

EXAMPLE 7 The Natural Isomorphism from Pn - 1 to < 

We leave it for you to verify that the mapping 

from P,>o-i to ^ " is one-to-one, onto, and linear. This is called the natural isomorphism from P^-i to 
because, as the following computations show, it maps the natural basis x"^, ^ | for into the 

standard basis for ^ 

1 = 1 + 0:^ + 0;^^ + • • • +0x""^ X (1,0, 0,..., 0) 
x = 0-hx=hOx^^ ' ' ' +0;^""^ X (0, 1,0,..., 0) 

x"-^ = 0=hOx + Ox^=f- • • • X (0,0,0,..., 1) 



EXAMPLE 8 The Natural Isomorphism from M22 to 

The matrices 



'1 0" 
0 0_ 


. S2 = 


'0 r 

0 0_ 


, S3 = 


'0 0" 
1 0_ 


. £4= 


'0 0" 
0 1_ 



form a basis for the vector space M22 of 2 x 2 matrices. An isomorphism X: M22 — ^ constructed by 

first writing a matrix A in M22 terms of the basis vectors as 



A = 



'ai 02' 




1 0 




0 1 


+ 133 


0 0 


+ (34 


0 0 


(23 (24 




_0 0_ 


1 


0 0_ 


_1 0_ 


0 1_ 



and then defining T as 
Thus, for example, 



T(A) = (ai,a2,a2,a4) 



1 -3 
4 6 



1, -3,4,6 



More generally, this idea can be used to show that the vector space M ^„ ^^m ^n matrices with real entries is 
isomorphic to 



EXAMPLE 9 Differentiation by Matrix Multiplication M 



Consider the differentiation transformation D:P2—^P2^^ the vector space of polynomials of degree three or 
less. If we map P3 and P2 into and R^, respectively, by the natural isomorphisms, then the transformation D 
produces a corresponding matrix transformation from R'^to R^. Specifically, the derivative transformation 

produces the matrix transformation 



0 1 0 
0 0 2 
0 0 0 



^0 
^3 



3a2 



Thus, for example, the derivative 



can be calculated as the matrix product 











2 






"0 


1 


0 


0' 


1 




r 


0 


0 


2 


0 




8 


0 


0 


0 


3 


4 
-1 




-3 















This idea is useful for constructing numerical algorithms to perform derivative calculations. 



Inner Product Space Isomorphisms 

In the case where Fis a real ^-dimensional inner product space, both Fand i?" have, in addition to their algebraic structure, 
a geometric structure arising from their respective inner products. Thus, it is reasonable to inquire if there exists an 
isomorphism from Vto that preserves the geometric structure as well as the algebraic structure. For example, we would 
want orthogonal vectors in Fto have orthogonal counterparts in and we would want orthonormal sets in Fto 
correspond to orthonormal sets in 

In order for an isomorphism to preserve geometric structure, it obviously has to preserve inner products, since notions of 
length, angle, and orthogonality are all based on the inner product. Thus, if V and W are inner product spaces, then we call 
an isomorphism X\V —^W inner product space isomorphism if 

(7(u),7(v)} = (u.v} 

It can be proved that if Fis any real ^-dimensional inner product space and has the Euclidean inner product (the dot 
product), then there exists an inner product space isomorphism from Fto i?". Under such an isomorphism, the inner 
product space Fhas the same algebraic and geometric structure as Pj^. In this sense, every ^-dimensional inner product 
space is a "carbon copy" of/?" with the Euclidean inner product that differs only in the notation used to represent vectors. 



EXAMPLE 10 An Inner Product Space Isomorphism A 



Let be the vector space of real ^-tuples in comma-delimited form, let il>f „ be the vector space of real nx\ 
matrices, let have the Euclidean inner product (u, vj = u • v, and let M„ have the inner product 

|u, v| = u^v in which u and v are expressed in column form. The mapping X.R^ Myi defined by 



(vi, V2,--.,v„) 



V2 



is an inner product space isomorphism, so the distinction between the inner product space and the inner 
product space M„ is essentially notational, a fact that we have used many times in this text. 



Concept Review 

• One-to-one 

• Onto 

• Isomorphism 

• Isomorphic vector spaces 

• Natural isomorphism 

• Inner product space isomorphism 

Skills 

• Determine whether a linear transformation is one-to-one. 

• Determine whether a linear transformation is onto. 

• Determine whether a linear transformation is an isomorphism. 



Exercise Set 8.2 

1. In each part, find ker(^), and determine whether the linear transformation Tis one-to-one. 

(a) T:R^ R^, where 7(;c, y) = O, x) 

(b) T:R^ — R^, where T(x, y) = (0, 2x + 3y) 

(c) T:R^ R^, where T(x, y) = (x+y,x-y) 

(d) 7:/?2_^^3^where7(;^,7) = (x^y^x^y) 

(e) 7:/j2__^^3^ where7(;^,7) = (x-y,y-x, 2x-2y) 

(f) 7:/J^-^i?^, where7(7:,7,z) = (x -hy -^z, x - y -z) 

Answer: 

(a) ker(T) = {0}, T is one-to-one 

ker(T) = |, 1 j |., ris not one-to-one 

(c) ker(T) = (0), T is one-to-one 



(d) ker(T) = {0}; T is one-to-one 

(e) ker(T) = {^-(1, 1)} ; T is not one-to-one 

(f) ker(T) = {k(0, 1, - 1)} ; r is not one-to-one 

2. Which of the transformations in Exercise 1 are onto? 

3. In each part, determine whether multiplication by v4 is a one-to-one linear transformation. 



(a) 



1 -2 

2 -4 



(b) 



A = 



(c) 



-3 

1 
2 
-1 

4 -2 
1 5 

5 3 



6 

3 5 7 
-12 4 
3 0 0 



Answer: 



(a) Not one-to-one 

(b) Not one-to-one 

(c) One-to-one 

4. Which of the transfonnations in Exercise 3 are onto? 

5. As indicated in the accompanying figure, let T.R^ be the orthogonal projection on the line y =x. 

(a) Find the kernel of T. 

(b) Is T one-to-one? Justify your conclusion. 




Figure Ex-5 



Answer: 

(a) ker(r)= 

(b) ris not one-to-one since k:er(7^ (0) . 

6. As indicated in the accompanying figure, let T:R^ be the linear operator that reflects each point about the j-axis. 

(a) Find the kernel of T. 

(b) Is T one-to-one? Justify your conclusion. 



Figure Ex-6 

7. In each part, use the given information to determine whether the Hnear transformation T is one-to-one. 

(a) T.R'^^R'^^ nulHty(O=0 

(b) T.BP^R'', rank(0=«-l 

(c) T.R'^^R'^- n<m 

(d) T.R^^R'^\ R{t)=R'^ 

Answer: 

(a) T is one-to-one 

(b) r is not one-to-one 

(c) T is not one-to-one 

(d) T is one-to-one 

8. In each part, determine whether the linear transformation T is one-to-one. 

(a) 7:^2 P3, where T{a^ ^ a\x ^~ r»2^^) =x{a^^a\x^ <327:^ j 

(b) 7:^2 ^ -^2^ where 7'(/>(;c)) = p[x -h 1) 

9. Prove: If Fand ^are fmite-dimensional vector spaces such that dim(}F) < dim(f^), then there is no one-to-one Hnear 
transformation f-y 

10. Prove: There can be an onto Hnear transformation from Vio ^only if dim(f^) > dim(f?^. 

(a) Find an isomorphism between the vector space of aU 3 x 3 symmetric matrices and 

(b) Find two different isomorphisms between the vector space of ah 2 x 2 matrices and 

(c) Find an isomorphism between the vector space of aU polynomials of degree at most 3 such that ;?(0) = 0 and 

(d) Find an isomorphism between the vector spaces span (1, sm(x), cos(x) } and /J^. 



Answer: 

(a) 

T 



a 


b 


c 


b 


d 


e 


c 


e 


f 



(b) 



T{a -h b s\n{x) + c: cos (t:) ) = 



12. 

{Calculus required) Let J.P\ • ^ be the integration transformation J 



p(x)dx. Determine whether J is 



one-to-one. Justify your conclusion. 

(Calculus required) Let Fbe the vector space C^^O, 1 j and let T:V R^^ defined by 

7'(f) = /(0) + 2/'(0) + 3/'(l) 

Verify that T is a linear transformation. Determine whether T is one-to-one, and justify your conclusion. 

Answer: 

T is not one-to-one since, for example, / (x) = (x — 1) is in its kernel. 

14. (Calculus required) Devise a method for using matrix multiplication to differentiate functions in the vector space 
span {1, sin(7:), cos(7:), sin(27:), cos(27:) ) . Use your method to find the derivative of 
3—4 sm(x) -\- sin(27:) 5 cos(27:). 

Does the formula T(^a, i, c J = ax^ ~hbx -^c define a one-to-one linear transformation from to P^"^ Explain your 
reasoning. 

Answer: 

Yes; it is one-to-one 

16. Let be a fixed 2x2 elementary matrix. Does the formula T{A) = EA define a one-to-one linear operator on M22^ 
Explain your reasoning. 

17. Let a be a fixed vector in j?^. Does the formula ^(v) = a x v define a one-to-one linear operator on /J^? Explain your 
reasoning. 

Answer: 

T is not one-to-one since, for example a is in its kernel. 

18. Prove that an inner product space isomorphism preserves angles and distances — that is, the angle between u and v in F 
is equal to the angle between ^(u) and ^(v) in W, and ||u — v|| fr= [[^(u) — T(v) || j^. 

19. Does an inner product space isomorphism map orthonormal sets to orthonormal sets? Justify your answer. 
Answer: 

Yes 

20. Find an inner product space isomorphism between and M23- 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer, 
(a) The vector spaces and P2 are isomorphic. 
Answer: 



False 

(b) If the kernel of a linear transformation T:P2 — ► is {0} , then Tis an isomorphism. 



Answer: 

True 

(c) Every linear transformation from to P9 is an isomorphism. 
Answer: 

False 

(d) There is a subspace of that is isomorphic to R^. 
Answer: 

True 

(e) There is a 2 x 2 matrix P such that T: M22 — ► ^22 ^i^fiiie^i by TiA) = AP — PA is an isomorphism. 
Answer: 

False 

(f) There is a linear transformation 7':P4 — ► -P4 such that the kernel of T is isomorphic to the range of T. 
Answer: 

False 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



8.3 Compositions and Inverse Transformations 

In Section 4.10 we discussed compositions and inverses of matrix transformations. In this section we will 
extend some of those ideas to general linear transformations. 

Composition of Linear Transformations 

The following definition extends Formula 1 of Section 4.10 to general linear transformations. 

Note that the word "with" establishes the order 
of the operations in a composition. The 
composition of T2 ^ith T\ is 

(T2ori)(u) = r2(Ti(u)) 

whereas the composition of T\ with Tj is 
(rio72)(u) = Ti(T2(u)) 

r 

DEFINITION 1 

If Ti : Z7 -^V and T2'.V -W are linear transformations, then the composition of T2 >^ith Ti , 
denoted by T2 o Ti (which is read "T2 circle ), is the function defined by the formula 

(r2ori)(u) = r2(ri(u)) (i) 

where u is a vector in U. 

L 

Remark Observe that this definition requires that the domain of T2 (which is V) contain the range of T\ . 
This is essential for the formula T2{T\ (u)) to make sense (Figure 8.3.1). 




Figure 8.3.1 The composition of T2 with Ti . 
Our first theorem shows that the composition of two linear transformations is itself a linear transformation. 



THEOREM 8.3.1 



If : Z7 -^V and 7'2: —^W are linear transformations, then {T2oT\)\U — > W is also a linear 
transformation. 



Proof If u and v are vectors in U and c is a scalar, then it follows from 1 and the linearity of Ti and T2 that 

(TsoTOCu + v) = 72(71 (u + v)) = 72(71 (u) + 7'i(v)) 
= 72(7i(u)) + 72(7i(v)) 
= (72o7i)(u) + (72o7i)(v) 

and 

(72o7i)(cu) = 72(7i(cu))=72(c7i(u)) 
= c72(7i(u))=c(72o7i)(u) 

Thus, Tj o T\ satisfies the two requirements of a linear transformation. 

EXAMPLE 1 Composition of Linear Transformations A 

Let T\\P\ — ► P2 and T2 '.P2~^ ^2 linear transformations given by the formulas 

Tx{p{x))=xp{x) and T2{p{x)) = p(2x A) 

Then the composition (T'2 oT\)\P\ P2^^ given by the formula 

(72 o T{){p{x)) = T2(Ti(p(x))) = T2(xp(x)) = (2x + 4)p(2x +4) 
In particular, if;?(;^) =cq + c\x, then 

(72o7i)(;?(x)) = (,T2oTi)(co + cix) = (2x + 4)(co + ci(2x + 4)) 

= col2x + 4'j+ci(2x+4)'^ 



EXAMPLE 2 Composition with the Identity Operator ^ 

If 7: f — ♦ f'^ is any linear operator, and if /; f f is the identity operator (Example 3 of Section 
8.1), then for all vectors v in V, we have 

(7o/)(v) = 7(/(v))=7(v) 

(/o7)(v) = /(7(v)) = 7(v) 

It follows that 7 o I and / o 7 are the same as T; that is. 



7o/ = 7 and /o7=7 



(2) 



As illustrated in Figure 8.3.2, compositions can be defined for more than two linear transformations. For 
example, if 

Ti:U-^V, TxV-^W, and TyW-^Y 
are linear transformations, then the composition T20T20 T\ is defined by 

(73 o 72 o7i)(u) = 73(72(71 (u))) 



(3) 




U V w 

Figure 8.3.2 The composition of three linear transformations. 



Inverse Linear Transformations 

In Theorem 4.10.1 we showed that a matrix operator 7^: — > is one-to-one if and only if the matrix A is 
invertible, in which case the inverse operator is ^^-1 • We then showed that if w is the image of a vector x 
under the operator 7^, then x is the image under ^^-1 of the vector w (see Figure 4.10.8). Our next objective 
is to extend the notion of invertibility to general linear transformations. 

Recall that if 7': — ► is a linear transformation, then the range of T, denoted by R(t) , is the subspace of W 
consisting of all images under Tof vectors in V. If Tis one-to-one, then each vector w in ^(^) is the image of 
a unique vector v in V. This uniqueness allows us to define a new function, called the inverse of T and 
denoted by •> that maps w back into v (Figure 8.3.3). 




Figure 8.3.3 The inverse of Tmaps 7(v) back into v. 



It can be proved (Exercise 19) that 7 ^ :R(t) — > f''^ is a linear transformation. Moreover, it follows from the 
definition of that 



r-(7(.)) = r-.(w)=. 



(4) 



T[T-'[yr]) = T[v)=y, (5) 



so that T and 7' ^ , when applied in succession in either order, cancel the effect of each other. 



Remark It is important to note that if 7': — > f?^ is a one-to-one linear transformation, then the domain of 
7'"^ is the range of T, where the range may or may not be all of W. However, in the special case where 
X:V —^F one-to-one linear operator and Fis ^-dimensional, then it follows from Theorem 8.2.2 that T 
must also be onto, so the domain of 7'"^ is all of V. 



EXAMPLE 3 An Inverse Transformation A 

In Example 5 of Section 8.2 we showed that the linear transformation T'.Py^ — ► -P^-hl given by 

is one-to-one; thus, Thas an inverse. In this case the range of Tis not all of P^-i-i but rather the 
subspace of P„^i consisting of polynomials with a zero constant term. This is evident from the 
formula for T\ 

It follows that T^^ '.R(t) — ► Py^ is given by the formula 

T'^ic^x^cix^^ • • • +c„;^"'^^j=co + cix+ • • • 

For example, in the case where « > 3, 

T'^ ilx^x'^ + 5x^ + 3x^^ = 2^x^5x^ + 3x'^ 



EXAMPLE 4 An Inverse Transformation A 

Let T.I^ — * F? be the linear operator defined by the formula 

T{x\,X2.X2) = (3a: 1 ^X2. -2x\ -47:2 + 3^3, 5x\ ^Ax2-2x2) 

Determine whether Tis one-to-one; if so, find {x\, X2, ^3 j. 

Solution It follows from Formula 12 of Section 4.9 that the standard matrix for T is 

3 1 0" 
-2 -4 3 
5 4-2 

(verify). This matrix is invertible, and from Formula 7 of Section 4.10 the standard matrix for 



7' ^ is 



,-1 



4 -2 -3 

"11 6 9 
-12 7 10 



It follows that 



pi" 












4 


-2 


-3" 






4x1 




2x2 




3x2 












-11 


6 


9 


^2 




-lUi 


+ 


Sx2 


+ 


9x3 


^3. 


I 






^3 




-12 


7 


10 


X2. 




-12;ri 


+ 


1^2 


+ 


10x3 



Expressing this result in horizontal notation yields 

7'~M7:i,7:2,:^3)= (4:^1 — 2^:2 — 3x3, — 11:^1 + 67:2 + 9;c3, —12:^1+77:2 + 107:3 



Composition of One-To-One Linear Transformations 

The following theorem shows that a composition of one-to-one linear transformations is one-to-one, and it 
relates the inverse of a composition to the inverses of its individual linear transformations. 

THEOREM 8.3.2 

IfTi'.U—^V^ and T2'.V —^W ^rQ one-to-one linear transformations, then 

(a) T2 o T\ is one-to-one. 

(b) (7'2o7'i)"^ = 7'f^ oTy^. 

J] 3 

Proof (a) We want to show that T2 o T\ maps distinct vectors in U into distinct vectors in W. But if u and v 
are distinct vectors in C/, then T\ (u) and T\ (v) are distinct vectors in F since Ti is one-to-one. This and the 
fact that T2 is one-to-one imply that 

r2(Ti(u)) and TsCTiCv)) 

are also distinct vectors. But these expressions can also be written as 

(r2ori)(u) and (TsoTOCv) 
so 7*2 o T\ maps u and v into distinct vectors in W. 

Proof (b) We want to show that 

(r2o7i)-i(w)=(Tfior2"^)(w) 

for every vector w in the range of 7*2 o Ti . For this purpose, let 

u=(7'2o7i)-^(w) (6) 



so our goal is to show that 



u =(7-1-10 7-2-1 )(w) 



But it follows from 6 that 

(r2o7'i)(u)=w 

or, equivalently, 

72(71 (u))=w 

Now, taking of each side of this equation, then taking 7'"^ of each side of the result, and then using 4 
yields (verify) 

or, equivalently. 

In words, part (b) of Theorem 8.3.2 states that the inverse of a composition is the composition of the inverses 
in the reverse order. This result can be extended to compositions of three or more linear transformations; for 
example, 

(73 o 72 o 7i) = 7f ^ o o 73-1 (7) 

In the case where T^, Tg, and are matrix operators on R^'\ Formula 7 can be written as 

{TcoTboTjO =Tj^ o7g oT^ 

or alternatively as 

Note the order of the subscripts on the two 
sides of Formula 8. 



Concept Review 

• Composition of linear transformations 

• Inverse of a linear transformation 

Skills 

< Find the domain and range of the composition of two linear transformations. 

• Find the composition of two linear transformations. 

« Determine whether a linear transformation has an inverse. 

• Find the inverse of a linear transformation. 



Exercise Set 8.3 



1. Find (7-2 0 TiXx.^)- 

(a) Ti{x,y) = (2x, 3y), T2{x,y) = {x -y, x+y) 

(b) Tx{x,y) = ix- 3y, 0), T2{x,y) = (4x - 5y, 3x - 6y) 

(c) Ti(x,y) = (2x, -3y,x+y),T2(x,y,z) = (x-y,y+z) 

(d) Ti(x,y) = (x-y,y,x), T2(x,y,z) = iO,x+y+z) 

Answer: 

(a) (7-2 o Ti)(x, y) = (2x - 3y, 2x + 3y) 

(b) (72 o Ti)(x, y) = (4x - \2y, 3x - 9y) 

(c) (72 o TiXx, y) = (2x + 3y, X- 2y) 

(d) (T2oTi)ix,y) = i0.2x) 

2. Find(73o72o7i)(x,7)- 

(a) 7i(;ir,>') = (-27, 3x, x -2y),T2(x,y,z) = (y,z,x),T2ix,y,z) = (x+z,y-z) 

(b) Ti(x,y) = ix+y,y. -x),T2(x.y,z) = (0,x+y+z,3y), 
T2(x, y, z) = {3x + 2y,4z-x- 3y) 

3. Let 7i : M22 — ► R and 72 : 22 ~* ^^22 be the linear transformations given by 7i (A) = tr(A) and 
72 (/) = ^^. 

Find (7i o 72) (A), where ^ = ^ j . 
(b) Can you find (72 o Ti)(A)? Explain. 
Answer: 

(a) a + 

(b) (72 o T\){A) does not exist since T\ (A) is not a 2 x 2 matrix. 

4. Let Ti-Pyi—* Pyi and T^Pn^ P-n be the linear operators given by T\ {,p{x)) = p{x — \ ) and 
T2ipi.^)) =p(x + !)■ Find (7i o 72)(;?(7:)) and (72 o Ti)(p(x)). 

5. LetT\:V-^V be the dilation 7i (v) = 4v. Find a linear operator T2:V-*V such that 7i o 72 = / and 
72o7i=/. 

Answer: 
T2(v) = ^v 

6. Suppose that the linear transformations : ^2 ~^ ^2 ^2 ' P2^ ^3 given by the formulas 



Ti(p(x)) =jp(j: + 1) mdT2(p(x)) =3sp(:r). Find o Ti)(^o + -32^^). 

7. Let ^o(^") be a fixed polynomial of degree m, and define a function T with domain P„ by the formula 
T(j?(x)) = pij^oix)). Show that Tis a linear transformation. 

8. Use the definition of 73 o 7*2 o Ti given by Formula 3 to prove that 

(a) ^3 o 7*2 o Ti is a linear transformation. 

(b) 73 o ^2 o Ti = (T3 o T2) o Ti. 

(c) T3 o T2 o Ti = 73 (72 o 7i). 

9. Let 7;^^ be the orthogonal projection of f^-' onto the xy-plane. Show that 7o 7 = 7- 

10. In each part, let X'R^ — ► be multiplication by A. Determine whether 7 has an inverse; if so, find 

11. In each part, let T:R^ — ► /J"^ be multiplication by ^. Determine whether 7has an inverse; if so, find 





r 






7,-1 






] 




L 


X2 





(a) 






1 


5 


2' 




A = 




1 


2 


1 








1 


1 


0 


(b) 






1 


4 


-1 




A = 




1 


2 


1 








•1 


1 


0 


(c) 




1 


0 


r 






A = 


0 


1 


1 








1 


1 


0 




(d) 




'1 




■1 


1 




A = 


0 




2 


-1 






2 




3 


0 



Answer: 



(a) rhas no inverse. 



(d) 






3x1 + 3^:2-^:3 








—2x1 — 27:2 1 X3 




X3 




-4x1 — 5a2 + 2x3 



12. In each part, determine whether the linear operator T:R^ — ► J?" is one-to-one; if so, find 

(a) ^(^h ^2, ^n) = (0, ^1, ^2, 

(b) ^(^1. = (^«, - ^2, ^1) 

(c) T(xuX2..-^.Xn) = (^2,^3,— 

13. Let T:R^ — ► i?" be the linear operator defined by the formula 

T(xu X2 Xy^) = iaixu ^2X2 a^^n) 

where flj, ..^ i3„ are constants. 

(a) Under what conditions will Thave an inverse? 

(b) Assuming that the conditions determined in part (a) are satisfied, find a formula for 
T~^lxuX2 Xy^"^. 

Answer: 

(a) #Oforj= 1, 2, 3, « 

(b) 7"^ (xi, X2, X2, Xy^) = l^xi, ^^2, -^xs, j 

14. Let : J?^ — ♦ J?^ and ^2 — ♦ i?^ be the linear operators given by the formulas 

Ti(x,y) = (x+y,x^y) and T2ix,y) = (2x+y,x^2y) 

(a) Show that Ti and T2 are one-to-one. 

(b) Find formulas for 

(c) Verify that (T2 o Ti) = Tf ^ o T^* • 



15. Let Ti :P2^ P3 and 72:/'3 — ► P2 be the linear transformations given by the formulas 

Ti(p{x))=xpix) and Tiipi^)) = p{x + I) 

(a) Find formulas for Tf ^ (p(x)), T^^ (j?(x)), and (T2 o TO'^ipix))- 

(b) Verify that 0 70"^ = Tf ^ o • 

Answer: 

(^) rfl(;.(x)) = ^; r2-^(^(x))=^(x-l); (7'2ori)-l(;>(x)) = ip(x- 1) 

16. Let T^: J?^ — ► J?^? Tg.F? — ► i?"^' Tq'.R^ — ► J?^ ^'^^ reflections about the xy-plane, the ;i^2-plane, and 
the yz-plane, respectively. Verify Formula 8 for these linear operators. 

17. Let T'Pi — ► be the function defined by the formula 

(a) FindT(l -2;^). 

(b) Show that T is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Find T ^ ^2, 3 and sketch its graph. 

Answer: 

(a) (1, -1) 

(d) T'\2^3) = 2 + x 

18. Let T:R^ — ► R^ be the linear operator given by the formula T(x^ y) = (x + ky, — jy)- Show that Tis 
one-to-one and that 7*"^ = J- for every real value of k, 

19. Prove: If 7*: — ► is a one-to-one linear transformation, then T~^ :R(t) — ► f'" is a one-to-one linear 
transformation. 

In Exercises 20-21, determine whether T\oT2 — T2 o T\. 

(a) T\ '.^ R^ is the orthogonal projection on the x-axis, and J?^ — ♦ R^ is the orthogonal projection 
on the j-axis. 

(b) Ti '.R^ R^ is the rotation about the origin through an angle Bi , and T2 '.R^ ^ R^ is ^^e rotation 
about the origin through an angle 92- 

(c) T'j : J?^ — ► R^ is the rotation about the x-axis through an angle , and T2'.R^ R^ is ^^e rotation 
about the z-axis through an angle 92- 

(^) Ti — > is the reflection about the x-axis, and j.-, p/ . p^ is the reflection about the y-axis. 

(b) Ti '.R^ R^ is the orthogonal projection on the x-axis, and : J?^ R^ is the counterclockwise 
rotation through an angle (/. 



(c) Ti '.R^ is a dilation by a factor k, and T'j; J?"^ — ♦ is the counterclockwise rotation about the 

z-axis through an angle 0. 

Answer: 

(a) 0 72 = ^2 oTi 

(b) Ti o 72 3t ^2 o Ti 

(c) Ti o7'2 = 7'2o7'i 

22. (Calculus required) Let 

Djfj = /'(x) and jlf^ = J^f(t)dt 

be the linear transformations in Examples 11 and 12 of Section 8.1. Find (J o Z)) (f ) for 

(a) {(x)=x^-\r3x^2 

(b) = sm;^ 

(c) f(x)=e'^ + 3 

23. (Calculus required) The Fundamental Theorem of Calculus implies that integration and differentiation 
reverse the actions of each other. Define a transformation D.Py^ — ► -^m— 1 by D(^p{x) ) = (^), and 

define J.Py^-l — ► 

(a) Show that D and J are linear transformations. 

(b) Explain why J is not the inverse transformation of D. 

(c) Can the domains and/or codomains of D and J be restricted so they are inverse linear transformations? 

True-False Exercises 

In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 

(a) The composition of two linear transformations is also a linear transformation. 
Answer: 

True 

(b) If 7i : r r and 72 : r r are any two linear operators, then 7i o T2 = 7*2 o Ti . 
Answer: 

False 

(c) The inverse of a linear transformation is a linear transformation. 
Answer: 



False 

(d) If a linear transformation Thas an inverse, then the kernel of 7 is the zero subspace. 
Answer: 

True 

(e) If — ► is the orthogonal projection onto the x-axis, then 7*"^ ; J?^ — ► b} maps each point on the 
X-axis onto a line that is perpendicular to the x-axis. 

Answer: 

False 

iS)\iT\\JJ —^V and T^^V —^W are linear transformations, and if Ti is not one-to-one, then neither is 

Answer: 

True 
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8.4 Matrices for General Linear Transformations 



In this section we will show that a general linear transformation from any w-dimensional vector space V to any 
m-dimensional vector space W can be performed using an appropriate matrix transformation from to R^. This idea is 
used in computer computations since computers are well suited for performing matrix computations. 



Matrices of Linear Transformations 

Suppose that V is an ^-dimensional vector space, W is an m-dimensional vector space, and that 7; ^ ^ is a linear 
transformation. Suppose further that 5 is a basis for V, that 5'' is a basis for W, and that for each vector x in V, the 
coordinate matrices for x and ^(x) are [x] £ and [^(x) ] £ respectively (Figure 8.4.1). 



A vector 
in V 
(n-dimensional) 



A vector 



A vector 
in IV 
(m-dimensional) 



A vector 
in R"* 



Figure 8.4.1 

It will be our goal to find an ^ x n matrix A such that multiplication by A maps the vector [x] ^ into the vector [ ^(x) ] £* 
for each x in V (Figure 8.4.2a). If we can do so, then, as illustrated in Figure 8.4.2 b, we will be able to execute the linear 
transformation Thy using matrix multiplication and the following indirect procedure: 

r n 
Finding T (x) Indirectly 



Step 1. Compute the coordinate vector [x] ^. 

Step 2. Multiply [x] £ on the left by A to produce [^(x) ] £*. 

Step 3. Reconstruct ^(x) from its coordinate vector [^(x) ] £' 



Tmaps 
Vinto W 



Tlx) 



(7Tx)J.. 



Multiplication 

by A 
mays into 



(1) 



Direct 



computation 

Multiply by A 
(2) 



^ Tlx) 
(3) 



Figure 8.4.2 



L 



J 



The key to executing this plan is to find an ^ x « matrix A with the property that 



A[x]B=[nx)]B' 



(1) 



For this purpose, let B = (ui, 112, ...» u^} be a basis for the ^-dimensional space V and fl' = ^vi, V2, Vj^^ a basis for 
the m-dimensional space W. Since Equation 1 must hold for all vectors in V, it must hold, in particular, for the basis 
vectors in B; that is. 



4[^l]B=[T(ul)]B^ ^[u2]b=[TCu2)]b' A[u„] s= [T(u„)]s' 



(2) 



But 



[ni]B = 



'r 




0 




0 


0 




1 




0 


0 


. Mb= 


0 


.-" . [™m]b = 


0 


: 




: 






0 




0 




1 



so 



A[ui]s = 



an au 



: 



an 



A[U2] 



B 



a2\ ^22 



^m2 ■■■ ^mn 



<^\2 
<^22 

<^m2 



a\\ an 
CL2\ <^22 



<^2n 





0 














0 














0 








: 








1 







Substituting these results into 2 yields 



<^2\ 
I 



= [7'(ul)]5^ 



^^2 
^22 



= [nn2)]B* 



^2n 



which shows that the successive columns of A are the coordinate vectors of 

with respect to the basis fl'. Thus, the matrix A that completes the link in Figure SA.la is 



A=[[nni)]s'\[nn2)]B'[..mii„)]s'] 



(3) 



We will call this the matrix for T relative to the bases B and B and will denote it by the symbol [T'\ Using this 
notation, Formula 3 can be written as 



[7']b',B= [[7'(ui)]b'|[7'(u2)]b'|...|[7'(u«)]b'] (4) 

and from 1 , this matrix has the property 

\T\s'\bMb=\T(x)]s' (5) 

We leave it as an exercise to show that in the special case where Tj\;.R^ — ► is multiplication by A, and where B and 5' 
are the standard bases for and R^, respectively, then 

\T\b\b = ^ (6) 



Remark Observe that in the notation ['^]b\B the right subscript is a basis for the domain of T, and the left subscript is 
a basis for the image space of T (Figure 8.4.3). Moreover, observe how the subscript B seems to "cancel out" in Formula 
5 (Figure 8.4.4). 



Basis for the 
image space 



Basis for the 
domain 



Figure 8.4.3 

Cancellation ' 
Figure 8.4.4 

EXAMPLE 1 Matrix for a Linear Transformation M 

Let T.Pi P2^Q the linear transformation defined by 

Tip(?:))=xp(x) 
Find the matrix for T with respect to the standard bases 

5=|ui,U2| and 5' = |vi, V2, vsj 

where 

ai = 1, U2 = :^; vi = 1, V2 = :t, V2 = ^^ 

Solution From the given formula for T we obtain 

By inspection, the coordinate vectors for T(ui) and 7'(u2) relative to B' are 



. [T(n2)]B' = 



[nui)]B' = 

Thus, the matrix for T with respect to B and is 



0 0 

1 0 
0 1 



EXAMPLE 2 The Three-Step Procedure M 

Let T:Pi — » P2 be the linear transformation in Example 1, and use the three-step procedure described in 
the following figure to perform the computation 

.2 



T(a + bx^ = x(a + bx^ = ax + bx^ 



Direct 



(I) 
M 



computalion 



(3) 



Multiply by [71^. ^ 

\b — -^imk' 



Solution 

Step 1. The coordinate matrix foYx = a + bx relative to the basis B= ( 1 , ) is 

[x]b = 



Step 2. Multiplying [x] ^ by the matrix [T]£f £ found in Example 1 we obtain 





"0 0" 




'o' 




[T]b',b[^]b = 


1 0 




a 


= mx)]B 


0 1 




b 





Step 3. Reconstructing ^(x) = T{a + bx) from [^(x) ] we obtain 

7^(3 + i^: J = 0 + ^3(7: + i^:^ = ax + bx^ 



Although Example 2 is simple, the procedure that it 
illustrates is applicable to problems of great 
complexity. 

EXAMPLE 3 Matrix for a Linear Transformation A 

Let 7': J?^ j?^ be the linear transformation defined by 

^1 
^2 



^2 




0 r 


-5x1 + 13x2 




-5 13 


-7x1 + 16x2 




-7 16 



Find the matrix for the transformation T with respect to the bases B— (ui , U2 } iov [(^ and 
5' = |vi, V2, V3| for where 





"3" 




'5" 


ui = 


1 


. «2 = 


_2_ 





r 




-1" 




'0" 




0 


. V2 = 


2 


. V3 = 


1 




-1 




2 




2 



Solution From the formula for T, 





r 




2 


7(ui) = 


-2 


, T(n2) = 


1 




-5 




-3 



Expressing these vectors as Hnear combinations of vj, V2, and V3, we obtain (verify) 

r(ui) = VI - 2v3, 7'(U2) = 3vi -h V2 - V3 

Thus, 



[7'(ui)]fi' = 



1 
0 
-2 



[7'(u2)]b' = 



so 



[7']b',b=[[7'(«i)]b'|[7'(u2)]b'] = 



3 
1 

-1 

1 3 
0 1 
-2 -1 



Remark Example 3 illustrates that a fixed linear transformation generally has multiple representations, each depending 
on the bases chosen. In this case the matrices 







0 r 


T 




-5 13 






-7 16 



and [T]2\B = 



1 3 
0 1 
-2 -1 



both represent the transformation T, the first relative to the standard bases for p} and the second relative to the bases 
B and 5' stated in the example. 



Matrices of Linear Operators 

In the special case where y =1^ (so that fy^—^yisa, linear operator), it is usual to take B = when constructing a 
matrix for T. In this case the resulting matrix is called the matrix for T relative to the basis B and is usually denoted by 
[ T] £ rather than [T]£^£.li B = { , U2, . . u„ ) , then Formulas 4 and 5 become 

Phrased informally. Formulas 7 and 8 state that the 
matrix for T, when multiplied by the coordinate 
vector for x, produces the coordinate vector for ^(x) 



[T]b= [[T(ui)]g|[r(u2)]B|...|[r(u„)]B] (7) 



[T]BMB=[nx)]B (8) 

In the special case where T.R^—^R^isa. matrix operator, say muhipHcation by A, and B is the standard basis for 
then Formula 7 simplifies to 

[T]b = A (9) 



Matrices of Identity Operators 



Recall that the identity operator [ l^ . y maps every vector in V into itself, that is, /(x) = x for every vector x in f^^. The 
following example shows that if V is /i-dimensional, then the matrix for / relative to any basis B for V is the « x « identity 
matrix. 



EXAMPLE 4 Matrices of Identity Operators M 



If 5 = (ui , U2, . - u„ ) is a basis for a fmite-dimensional vector space f^, and if /; is the identity 

operator on f^, then 

/(ui)=ui, /(U2) =U2,-.., /(u„)=u„ 

Therefore, 



U]b = 



1 0 

0 1 

0 0 

0 0 



=/ 



T T T 



EXAMPLES Linear Operator on P2 M 

Let T:P2 — » P2 be the linear operator defined by 

that is, t(c(\ -^c\x =h ^^2^^ J = + «:i ^3^: - 5 j + C2(3x - 5)^. 
(^) Find [ T] ^ relative to the basis 5 = 1 1 , :^ ^ | . 

Use the indirect procedure to compute -I- 2^: -h J. 

(c) 

Check the result in (b) by computing directly. 



Solution 



(a) From the formula for T, 

7(1) = 1, T(x) = 3x - 5, t(x^^ = (3x - 5)2 = 9;^^ - 30x + 25 



so 





r 




~-5~ 




25" 


[7'(1)]b = 


0 


. [T(x)]s = 


3 


-30 




0 




0 


9 



Thus, 



1 -5 25 
0 3 -30 
0 0 9 



C*) Step 1. -pjjg coordinate matrix for p = 1 + 2x + Sj:^ relative to the basis S — J^\,x,x^^ { 

[p]B = 



IS 



step 2. Muhiplying [p ] ^ by the matrix [ T] £ found in part (a) we obtain 



[T]b[v]b = 



"1 


-5 


25" 


"1" 




66" 


0 


3 


-30 


2 




-84 


0 


0 


9 


3 




27 



= [?'(p)]b 



step 3. Reconstructing 7'(pJ = + 2x + Sx^ J from [T(p) ] ^ we obtain 

rjl + 2x + 3x^) = 66 - 84x + 21x^ 

(c) By direct computation, 

t(\ + 2x + 3x^^ = 1 + 2(3^-5)+ 3(3x- 5)2 

= 1 + 6x - 10 + 21x^ - 90x +75 
= 66- SAx + 27x2 

which agrees with the result in (b). 



Matrices of Compositions and Inverse Transformations 

We will conclude this section by mentioning two theorems without proof that are generalizations of Formulas 4 and 7 of 
Section 4.10. 



THEOREM 8.4.1 

IfTi-U —*V and T2'.V —*W are linear transformations, and if 5, B ", and B' are bases for U, V, and W, 
respectively, then 



[T2oTi]s'b^[T2]b',B"['^i]b",B 



(10) 



□I 



THEOREM 8A2 

If T. V ' V is a linear operator, and if 5 is a basis for V, then the following are equivalent. 

(a) T is one-to-one. 

(b) [ T] ^ is invertible. 



Moreover, when these equivalent conditions hold. 



.-1 



(11) 



Remark In 10, observe how the interior subscript 5" (the basis for the intermediate space V) seems to "cancel out," 
leaving only the bases for the domain and image space of the composition as subscripts (Figure 8.4.5). This cancellation 
of interior subscripts suggests the following extension of Formula 1 0 to compositions of three linear transformations 
(Figure 8.4.6): 



[r3o72oTi]5'^5= [7'3]b',B"[^2]b",B"[^i]b'U 



(12) 



Basis 



I Cancellation 



Cancellation 




Basis £f' Basis ^• 

Figure 8.4.6 



Basis 



The following example illustrates Theorem 8.4.1. 

EXAMPLE 6 Composition < 

Let T\\P\ — > ^2 tl^^ linear transformation defined by 
and let T2 '.P2^ ^2 ^1^^ linear operator defmed by 
Then the composition {T2 oT\)\P\ P2^^ given by 



(7-2 o Ti)(p(x)) = 7-2(7-1 (p(x))) = 7-2(7:^(7:)) = (3x - 5)p(3k - 5) 
Thus, if p(x) = CQ 4 cix, then 

(T2 o 7-1 ) (CO + c IX) = (3x - 5) (CO + c 1 (3x - 5) ) 

.2 (13) 



= co(3x-5) + ci(3x-5)^ 



In this example, plays the role of U in Theorem 8.4. 1 , and plays the roles of both V and W; thus we can 
take 5' = 5" in 10 so that the formula simplifies to 

[T2oTi]s\B=['^2]B'['^i]B',B (14) 



Let us choose B= { 1 , x } to be the basis for and choose B' — 1 1 , x, | to be the basis for We 
showed in Examples 1 and 5 that 



0 0 

1 0 
0 1 



1 -5 25 

0 3 -30 

0 0 9 



[Ti]b',B= 1 0 and [T2]b'^ 
Thus, it follows from 14 that 

[T2oTi]b;b= 0 3 -30 1 0 = 3 -30 (15) 



'1 


-5 


25" 


"0 


0" 




"-5 


25' 


0 


3 


-30 


1 


0 




3 


-30 


0 


0 


9 


0 


1 




0 


9 



As a check, we will calculate [^2 o ^1 ] B directly from Formula 4. Since B= { 1, t:} , it follows from 
Formula 4 with = 1 and U2 = :^ that 

[T2oTi]B',B=[[(T2oTi)(\)]s^\[(T2oTim]B'] (16) 

Using 13 yields 

(72 o = 3x ^ 5 and (T2 o = (3x - 5)^ = 9;^^ - 30x + 25 

From this and the fact that ^ = . it follows that 



-5 

3 
0 



[(7'2oTi)(l)]5' = 

Substituting in 16 yields 

[T2oTi]b\B = 

which agrees with 15. 



and [(7'2oTi)(x)]5' = 



25 
-30 
9 



-5 25 
3 -30 

0 9 



Concept Review 

• Matrix for a linear transformation relative to bases 

• Matrix for a linear operator relative to a basis 

• The three- step procedure for finding ^(x) 

Skills 

• Find the matrix for a linear transformation relative to bases of V and W. 

• For a linear transformation X.V ~>W fmd ^(x) using the matrix for T relative to bases of V and W. 



Exercise Set 8.4 

1. Let T:P2 — > P3 be the linear transformation defined by T(p(x)) =^(x). 

(a) Find the matrix for T relative to the standard bases 

5=|ui,U2,U3} and 5' = |vi, V2, V3, ¥4} 

where 

ui = l, U2=x, U3 = a:^ 

2 3 
VI = 1, V2 = x, V3 = 7: , V4 = x 

(b) Verify that the matrix [ T] £' £ obtained in part (a) satisfies Formula 5 for every vector x = c{^ + cix ^ ^2 



Answer: 

(a) 0 0 0 
1 0 0 
0 1 0 
0 0 1 

2. Let T:P2—^Pi be the linear transformation defined by 

T(a{}^a\x^a2X^^'^= (af^^ai'j- i2ai -h 3a2y 

(^) Find the matrix for T relative to the standard bases B= and B' = { ^ . ^ } for F2 P\ • 

(b) Verify that the matrix [ T] obtained in part (a) satisfies Formula 5 for every vector ^ = -|- 4= C27^^ ^2 

3. Let T\P2 — ► P2 the linear operator defined by 

T{a{^'^a\x^a2X^'^ = a^^a\{x — \ ^-^a2{x- 1)^ 

(^) Find the matrix for T relative to the standard basis 5 = 1 1 , ;^ , 7:^ for ^2- 

(b) Verify that the matrix [ T] £ obtained in part (a) satisfies Formula 8 for every vector -^^^^^^^^^ ^ in P2- 
Answer: 



(a) 



1 -1 1 
0 1 -2 
0 0 1 



4. Let T\R^ _> be the linear operator defined by 



andletfl= {ui,U2} be the basis for which 



ai = [;] and U2 = [-j] 



(a) Find [T]b. 

(b) Veriiy that Formula 8 holds for every vector x 'mf^. 
5. Let T\I^ _» iJ^ be defined by 



x\ 4 2x2 
0 



(a) Find the matrix [ 7] p'.B relative to the bases 5 = {ui , U2 ) ™d ^' = » ▼2» ^3 } » where 





"1" 




"2" 




"3' 


▼1 = 


1 


. V2 = 


2 


. V3 = 


0 




1 




0 




0 



(b) Verify that Formula 5 holds for every vector in g^. 
Answer: 
(a) [ 0 0 

4 ' 

S 4 
3 3 

6. Let T\F? be the linear operator defined by 

T(xi,X2, xi) = (xi -X2, X2 — XUXI -3:3) 

(a) Find the matrix for T with respect to the basis B = {vi, V2, V3} , where 

VI = (1,0.1). ¥2 = (0.1.1). V3= (1.1.0) 

(b) Verify that Formula 8 holds for every vector x=(xi,X2, ^3) in R^. 

(c) Is T one-to-one? If so, find the matrix of 7*"^ with respect to the basis B. 

7. Let T:P2 — ► P2 be the linear operator defined by T(p(x)) = p{2x + 1), that is, 

r(co + cix + ^2^^) = Cf)-\-ci(2x^l^ + C2(2x + \y 

(^) Find [ T] s with respect to the basis 5 = 1 1 , | . 



Use the three-step procedure illustrated in Example 2 to compute7'^2 — 3a: + 4j:^ j. 

(c) 

Check the result obtained in part (b) by computing ^ — 5x -i- 4x^ j directly. 
Answer: 

(a) fi 1 r 

0 2 4 
0 0 4 

(b) 3 + \0x + \6x^ 

8. Let T:P2 —* P3 be the linear transformation defined by T(p(x)) = xp(x — 3), that is, 

tIcq + CIX+ C2X^^ = x(cQ + ciix — 3^ + C2ix — 3)^^ 

(a) Find [T]b\b relative to the bases B = |l, x. ar^j and 5' = |l, x, x^. j . 

*^^) Use the three-step procedure illustrated in Example 2 to compute + ^ ~ j. 
(c) 

Check the result obtained in part (b) by computing directly. 



' Let VI = j and V2 = 



-1 

4 



, and let 

H-l I] 

be the matrix for T\R^ — ► relative to the basis B = {vi, V2} . 



(a) Find [7(vi)] sand [T(v2)]b- 

(b) Find7(¥i) and7(v2). 

Find a formula for T\ 



(d) 



Use the formula obtained in (c) to compute T\ 



Answer: 



(b) 
(c) 

(d) 



M i 

7 7 

107 24 

7 7 



7 

83 
■7 



10. [3-210 
LetA^ 1 6 2 1 
-3 0 7 1 
B' = ^Wi, W2, W3^, where 



be the matrix for X.R^ —* relative to the bases B = {vi, V2, V3, V4) and 





0 




2 




1 




6 




1 




1 




4 




9 


VI = 


1 




-1 


. V3 = 


-1 




4 




1 




-1 




2 




2 




"o" 




■-7' 




■-6' 






Wl = 


8 


, W2 = 


8 


. W3 = 


9 








8 




1 




1 







(a) Find[r(vi)]£.', [T{v2)]b; [7'(v3)]bS and [r(v4)]B'. 

(b) Find T(yi ) , T(v2) , T(v2) , and 7(v4) . 
(c) 



Find a formula for T 



^3 
X4 



(d) 



/ 



Use the formula obtained in (c) to compute T 



2 
2 
0 
0 



11. 



Leti4 = 



1 3 -1 

2 0 5 
6-2 4 



be the matrix for T:P2 — ► P2 with respect to the basis B = {vj, V2, V3) , where 



vi = 3x + Zx^^ V2 = - 1 + 37: + 2x^^ V3 = 3 + 7;t + 2x^- 

Find [T-CvOl^, [rCv2)]B, and [T'Cvg)]^. 

(a) Find r(vi), ^(¥2), and TCvs). 

(b) Find a formula for Tj^ao + i^i^: + 132^^ J- 

Use the formula obtained in (c) to compute 7*^1 + t:^ J. 



Answer: 



(a) 


r 




3" 




-1" 


[7'(vi)]b = 


2 


. [^(v2)]b = 


0 


. [T(v3)]b = 


5 




6 




-2 




4 



(b) rCvi) = 16 + 51x + 19x2. y^^^j = _ 6 _ + 5;r2^ y^^^^ = 7 + 40, + ISx^ 

(c) 7(flo + flix + a^A = 239fln-161fli+289fl9 ^ 201fln- lllat +247^:, ^ ^ 61fln-31fli + 107^:, 

V / 2^ 8 12 

(d) 7(1 = 22 + 56x + 14^2 



12. Let 7*1 :Pi — » ^'2 the linear transformation defined by 
and let 7*2 : ^^2 ~* ^2 linear operator defined by 



be the standard bases for and 

(a) Find [7*2 o 7*1 ] B\B, [72] B', and [7*1 ] b\B 

(b) State a formula relating the matrices in part (a). 

(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 

13. Let T\'.P\—¥ P2 be the linear transformation defined by 

7'lC<^0 + <^l^) = 2co-3ci3: 
and let 7*2:^^2 ~* ^3 linear transformation defined by 

Let 5= {l,x},5"=|l,:^,:t^},and5'=|l,7:,7:^,7:^j. 

(a) Find [T2 o Ti ] 5;^. [T2]b'\B*'. and [Ti ] 5'' 5. 

(b) State a formula relating the matrices in part (a). 

(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 
Answer: 



(a) 



0 0 
6 0 
0 -9 

0 0 



. U'2\b',B'' = 



0 0 0 

3 0 0 

0 3 0 

0 0 3 



. [7*1 = 



2 0 
0 -3 
0 0 



(b) W2oT\]b\b= [7'2]b',B"[7i]b",B 



14. Show that if 7*: f — » (f is the zero transformation, then the matrix for T with respect to any bases for V and \y is a zero 
matrix. 

1 5. Show that if 7 ]/ . Y is a contraction or a dilation of V (Example 4) of Section 8.1), then the matrix for T relative to 
any basis for V is a positive scalar multiple of the identity matrix. 

16. Let B = {v\, V2, V3, V4) be a basis for a vector space V. Find the matrix with respect to B of the linear operator 
T:V-*V defined by T{y\) = V2, 7'(v2) = vs, T(y2) = V4, ^(¥4) = vi . 

17. Prove that if B and fl' are the standard bases for R" and respectively, then the matrix for a linear transformation 
T:S" — » J?" relative to the bases B and B' is the standard matrix for T. 

18. (Calculus required) Let D: P2 —* P2 the differentiation operator ^(P) — p'i^) . In parts (a) and (b), find the 
matrix of D relative to the basis B= {pi, P2, P3) • 

(a) Pi = l, P2 = '. P3 = '^ 

(b) PI = 2, p2 = 2-3x. p3 = 2-3ar + 8x^ 

Use the matrix in part (a) to compute d(6 — 6x + 24x^ J. 

(d) Repeat the directions for part (c) for the matrix in part (b). 

19. (Calculus required) In each part, suppose that B= {f 1, £2, f 3} is a basis for a subspace V of the vector space of 
real-valued functions defined on the real line. Find the matrix with respect to B for differentiation operator D.V —*V- 

(a) fi = l, f2 = smx, f3 = cosar 

(b) fi = l. f2 = e'. f3=«^ 

(c) fi=«2', f2 = x«'^. £3 = ^2* 



4e ^ 6xe ^ — \0x e ^\ 



Answer: 

(a) 0 
0 
0 

0 



(b) 



(c) 



(d) 



0 
0 
1 

0 
0 1 
0 0 

2 1 
0 2 
0 0 



0 
-1 
0 

0 
0 
2 

0 
2 
2 



2 2x 



since 



"2 


1 


0" 


4" 




14" 


0 


2 


2 


6 




-8 


0 


0 


2 


-10 




-20 



20. Let Vbe a four-dimensional vector space with basis B, let Wbe a seven-dimensional vector space with basis B'\ and let 
7' _» be a linear transformation. Identify the four vector spaces that contain the vectors at the comers of the 
accompanying diagram. 

Direct 



compiitation 



(3) 



T Multiply by I rj^.^ 
Mir ^ tWli, 

Figure Ex-20 

21. In each part, fill in the missing part of the equation. 

(a) [T2oTi]B\B=['r2]J-[Ti]B",B 

(b) [T3oT2oTi]s'B= [^3] ^[7'2]b"',B"[7'i]b",b 

Answer: 

(a) B\ 5" 

(b) B\ 5"' 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer, 
fa) 

^ ^ If the matrix of a linear transformation —^W relative to some bases of V and W is 
vector X in V such that ^(x) = 2x. 
Answer: 



2 4 
0 3 



, then there is a nonzero 



False 



^ ^ If the matrix of a linear transformation fy" —^W relative to bases for V and W is 
vector X in y such that T(x) = Ax. 
Answer: 
False 

If the matrix of a linear transformation T'y'^fy relative to certain bases for V and W is ^ J' ^^^^ ^ one-to-one. 

Answer: 

True 

(d) lfS:V—¥V and T:V—^V are linear operators and 5 is a basis for V, then the matrix of y o T relative to B is 
[T]b[S]B' 

Answer: 

False 

(e) If 7*: F — ► is an invertible linear operator and 5 is a basis for V, then the matrix for relative to B is [T]^^ • 
Answer: 

True 



"2 4 
0 3 



, then there is a nonzero 
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8.5 Similarity 



The matrix for a linear operator T: F-^F depends on the basis selected for V. One of the fundamental problems of linear 
algebra is to choose a basis for Fthat makes the matrix for 7 as simple as possible — a diagonal or a triangular matrix, for 
example. In this section we will study this problem. 



Simple Matrices for Linear Operators 



Standard bases do not necessarily produce the simplest matrices for linear operators. For example, consider the matrix 
operator T:R^ whose standard matrix is 



T 




1 r 






-2 4 



(1) 



and view [ T] as the matrix for T relative to the standard basis 5 = {ej, 62) for Let us compare this to the matrix for 
T relative to the basis ^' = |^ » ^2 } for in which 



Since 



it follows that 



1 1 

-2 4 



(2) 



so the matrix for T relative to the basis B' is 



This matrix, being diagonal, has a simpler form than [ T] and conveys clearly that the operator T scales u'J by a factor of 2 
and U2 by a factor of 3, information that is not immediately evident from [T] . 



One of the major themes in more advanced linear algebra courses is to determine the "simplest possible form" that can be 
obtained for the matrix of a linear operator by choosing the basis appropriately. Sometimes it is possible to obtain a 
diagonal matrix (as above, for example), whereas other times one must settle for a triangular matrix or some other form. 
We will only be able to touch on this important topic in this text. 

The problem of finding a basis that produces the simplest possible matrix for a linear operator X:V —^V can be attacked by 
first finding a matrix for T relative to any basis, typically a standard basis, where applicable, and then changing the basis in 
a way that simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts about changing bases. 



A New View of Transition Matrices 

Recall from Formulas 7 and 8 of Section 4.6 that if 5 = (ui , U2, . . u„ } and 5' = ^ ^2 ' - ■ ^« } are bases for a vector 
space y, then the transition matrices from Bio B' and from B'' to B are 



^B-^B' = [ [11 ] B' I [12 ] B' I- - I [u« ] B' ] 



(3) 



Pb'^B=[WMU-\[<]b^ (4) 

where the matrices P£^£* and P£*^£ are inverses of each other. We also showed in Formulas 9 and 10 of that section 
that if V is any vector in V, then 

Pb^b'Mb= Mb' (5) 



^b'^b[v]b'= Mb (6) 

The following theorem shows that transition matrices in Formulas 3 and 4 can be viewed as matrices for identity operators. 



THEOREM 8.5.1 

If B and 5' are bases for a finite-dimensional vector space V, and if /; is the identity operator on V, then 

Pb^B*=U]b\B ^B'^B= 



Proof Suppose that B= {uj , U2, . - ., u„ ) and 5' — |^ » ^2 ' - ■ ^« } are bases for V. Using the fact that /(v) = v for all 
y in y, it follows from Formula 4 of Section 8.4 that 

U]b\B =[[/(ui)]51[^(^2)]b1-|[^(^«)]b'] 

= [[iii]b'|[^2]b'|---|[u«]b'] 

= P £^£* [Fonmila (3) above ] 

The proof that [/] 5 5' = P£*-^B is similar. 



Effect of Changing Bases on Matrices of Linear Operators 

We are now ready to consider the main problem in this section. 

n 

PROBLEM 

If B and B' are two bases for a fmite-dimensional vector space V, and if 7; ^ _> is a linear operator, what 
relationship, if any, exists between the matrices [T] £ and [T]£'7 

L J 

The answer to this question can be obtained by considering the composition of the three linear operators on V pictured in 
Figure 8.5.1. 



/ 



T 



I 



V V 1\v) TXv) 

V V V V 

B9$is»B' &a$U=:^ B4isis>^ Ba»SB^' 

Figure 8.5.1 

In this figure, y is first mapped into itself by the identity operator, then y is mapped into T(v) by T, and then ^(v) is 
mapped into itself by the identity operator. All four vector spaces involved in the composition are the same (namely, V), but 
the bases for the spaces vary. Since the starting vector is v and the final vector is T(v) , the composition produces the same 
result as applying T directly; that is, 

T=IoToI (7) 

If, as illustrated in Figure 8.5.1, if the first and last vector spaces are assigned the basis B' and the middle two spaces are 
assigned the basis B, then it follows from 7 and Formula 12 of Section 8.4 (with an appropriate adjustment to the names of 
the bases) that 

[^]b',B'= [^o^o^]b',B'= [^]b',b[^]b,b[^]b,B' (8) 

or, in simpler notation, 

[^]b'= U]b\b[^]bU]b,b' (9) 

We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as 

[T]b'= Pb^s'I"^] bPb'^b (10) 

In summary, we have the following theorem. 
THEOREM 8.5.2 

Let T y _> f be a linear operator on a finite-dimensional vector space V, and let B and B'' be bases for V. Then 

[T]s^ = p-^[T]sP (11) 

where P = P£*^£ and p-'^ = p 



Warning When applying Theorem 8.5.2, it is easy to forget whether P = P^*^ ^£ (correct) ox P = P^ (incorrect). It 
may help to use the diagram in Figure 8.5.2 and observe that the exterior subscripts of the transition matrices match the 
subscript of the matrix they enclose. 

[rig. = /* B-^^\T^f^p f^'^f^ 

L ♦ ' 

Exterior xubscripcs 



Figure 8.5.2 



In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices representing the same linear operator 
relative to different bases must be similar. The following theorem is a rephrasing of Theorem 8.5.2 in the language of 
similarity. 



THEOREM 8.5.3 

Two matrices, A and 5, are similar if and only if they represent the same linear operator. Moreover, if B = P~^AP, 
then P is the transition matrix from the basis relative to matrix B to the basis relative to matrix A. 



EXAMPLE 1 Similar Matrices Represent the Same Linear Operator M 



We showed at the beginning of this section that the matrices 

1 1 



C = 



-2 4 



and D = 



2 0 
0 3 



represent the same linear operator T R^ R^. Verify that these matrices are similar by finding a matrix P for 
which D = CP- 
Solution We need to find the transition matrix 

where ^' = » ^2 } is the basis for given by 2 and B= { e i , 62 } is the standard basis for We see by 
inspection that 

u'l =61+62 
U2 = 61 + 262 

from which it follows that 



and 



Thus, 



P = Ps'^S=[[<],[^2]s^=[\ I] 



We leave it for you to verify that 



and hence that 



p-' = 



2 -1 
-1 1 



"2 O" 




2 -1" 




1 r 




"1 r 


0 3_ 




-1 1_ 




-2 4 




1 2_ 


L 


1 


P-' 


c 


p 



Similarity Invariants 



Recall from Section 5.2 that a property of a square matrix is called a similarity invariant if that property is shared by all 
similar matrices. In Table 1 of that section (table reproduced below), we listed the most important similarity invariants. 
Since we know from Theorem 8.5.3 that two matrices are similar if and only if they represent the same linear operator 
X. V ♦ V-> it follows that if B and 5'' are bases for V, then every similarity invariant property of [ T] ^ is also a similarity 
invariant property of [ T] £■ for any other basis 5'' for V. For example, for any two bases B and 5'' we must have 

det([r]£) = det([T]£0 

It follows from this equation that the value of the determinant depends on T, but not on the particular basis that is used to 
obtain the matrix for T. Thus, the determinant can be regarded as a property of the linear operator T\ indeed, if V is a fmite- 
dimensional vector space, then we can define the determinant of the linear operator T to be 



det(0=det([T]5) (12) 

where B is any basis for V. 

Table 1 Similarity Invariants 



Property 


Description 


Determinant 


A and P~^AP have the same determinant. 


Invertibility 


A is invertible if and only if P~^AP is invertible. 


Rank 


A and p ~^AP have the same rank. 


Nullity 


A and p ~^AP have the same nullity. 


Trace 


A and p ~^AP have the same trace. 


Characteristic 
polynomial 


A and p ~^AP have the same characteristic polynomial. 


Eigenvalues 


A and p ~^AP have the same eigenvalues. 


Eigenspace 
dimension 


If A is an eigenvalue of A and p ~^AP-> then the eigenspace of A corresponding to A and the 
eigenspace of P~^AP corresponding to A have the same dimension. 



EXAMPLE 2 Determinant of a Linear Operator M 



At the beginning of this section we showed that the matrices 



and [T]s' = 



2 0 
0 3 



represent the same linear operator relative to different bases, the first relative to the standard basis B= (ei, 62) 
for and the second relative to the basis ^' = {u'l , U2 1 for which 



ui = 



^2 = 



This means that [T] and [T] ^' must be similar matrices and hence must have the same similarity invariant 
properties. In particular, they must have the same determinant. We leave it for you to verify that 

2 0 

0 3 



det 







1 1 


T 








-2 4 



= 6 and det[7']g' = 



= 6 



EXAMPLE 3 Eigenvalues and Bases for Eigenspaces M 



Find the eigenvalues and bases for the eigenspaces of the Hnear operator T: P2 — ► P2 defined by 



Solution We leave it for you to show that the matrix for T with respect to the standard basis 



[T]b = 



0 0 -2 

1 2 1 
1 0 3 



The eigenvalues of T are ,\ = 1 and A = 2 (Example 7 of Section 5.1). Also from that example, the 
eigenspace of [ T] ^ corresponding to A = 2 has the basis {ui , U2 } , where 





"-1" 




"0" 


ai = 


0 


, «2 = 


1 




1 




0 



and the eigenspace of [T] ^ corresponding to ,\ = 1 has the basis {U3) , where 

-2" 
1 



U3 = 



1 



The matrices iii, U2, and U3 are the coordinate matrices relative to B of 

Pl=-l+;t, P2 = ^» P3=-2 + ;t + 7: 

Thus, the eigenspace of T corresponding to X = 2 has the basis 

and that corresponding to A = 1 has the basis 



{p3} = {-2 + x + 7:2j 



As a check, you can use the given formula for T to verify that 

^(Pl) = 2pi, 7'(P2) = 2p2, and 7'(p3)=p3 



Concept Review 

• Similarity of matrices representing a linear operator 

• Similarity invariant 

• Determinant of a linear operator 

Skills 

• Show that two matrices A and B represent the same linear operator, and find a transition matrix P so that 



• Find the eigenvalues and bases for the eigenspaces of a Hnear operator on a fmite-dimensional vector space. 



Exercise Set 8.5 



In Exercises 1-7, find the matrix for T relative to the basis B, and use Theorem 8.5.2 to compute the matrix for r relative 
to the basis 5'. 

^'T.B? > i?^ is defined by 



and 5= {ui,U2) and 5' = |vi, V2|, where 







x\ - 2x2 






-^2 





"0" 




"2" 




_1_ 


; VI = 


_1_ 



, V2 



Answer: 



1 -2 
0 -1 



, [T]b' = 



3 


56 


11 


11 


2 


3 


11 


11 



2. T.R^ — ► F? is defined by 

and 5= {ui,U2) and S' = |vi, V2|, where 

Ql = 







x\ +7x2 


/2_ 


)= 


3x1 -4x2 



"2" 




4" 




r 




-r 


_2_ 


. «2 = 


_-l_ 




_3_ 


. V2 = 





3. T:i?^ — ♦ i?^ is the rotation about the origin through an angle of 45°; B and 5' are the bases in Exercise 1. 
Answer: 

13 25 



[7] 



B'- 



i_ L 

v/2 /2 

1 1 

/2 /2 



11/2 11/2 
5 9 

iiv/^ 11/2 



4. 7':^-' — ► j?-' is defined by 



^1 
^2 
^3 



XI +2x2-X3 

XI I 7x3 



and B is the standard basis for and 5' = ^vj , V2, V3 1 , where 

VI = 



'1" 




T 




T 


0 


. V2 = 


1 


. V3 = 


1 


0 




0 




1 



5« — ► is the orthogonal projection on the xj/ -plane, and B and 5' are as in Exercise 4. 



Answer: 



10 0 10 0 

[T]2= 0 10, [r]B'= 0 1 1 
0 0 oJ [o 0 0 

6. rii!^ — ► jR^ is defined by ^(x) = 5x, and B and fi' are the bases in Exercise 2. 

7. r:-Pi -^P\ is defined by r((afo +-31^) =(30 + 1), and 5= {pi, P2} and 5' = |qi, q2}, where pi = 6 + 3?: 
,P2 = 10 + 27r, qi = 2,q2 = 3 + 27:. 



Answer: 



[T]b= 



2 
3 
1 

2 



4 . [^b' = 




8. Find det(0. 

(a) where 7(7:1, ;^2) = (3^1 -4:^2, -^1 I ^^2) 

(b) 7:i^-^^/5^,where ri:>:i,.T2, :^3) = (^1 - ^2 - ^3 - ^l) 

(c) 7: ^2 • P2^ where 7(;. (7:) ) = (7: - 1 ) 

9. Prove that the following are similarity invariants: 

(a) rank 

(b) nullity 

(c) invertibility 

10. Let 7:^4 — ► -P4 be the linear operator given by the formula 7(^(7:)) = pi2x + 1) . 

(a) Find a matrix for T relative to some convenient basis, and then use it to find the rank and nullity of T. 

(b) Use the result in part (a) to determine whether T is one-to-one. 

11. In each part, find a basis for ^ relative to which the matrix for T is diagonal. 




(b) 




3k \ +7:2 



Answer: 




12. In each part, find a basis for ^ relative to which the matrix for T is diagonal. 




x\ — 2x2 — ^3 
-XI - X2-2x2 

-X2^X2^ 
-7:1+7:3 
7:1+7:2 

47:1 +7:3 
27:1+37:2 + 27:3 
7:1 +47:3 



13. Let T\P2 P2 be defined by 



TiaQ + aix+a2X^^ = (5aQ + 6ai + 2fl2) 



(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T. 

Answer: 



(a) A=-4. A = 3 

(b) Basis for eigenspace corresponding toA=— 4: — 2 + ^x + x^; basis for eigenspace corresponding to 



A = 3: 5~2x + x^ 

14. Let T: M22 M22 be defined by 



(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T. 



15. Let A be an eigenvalue of a linear operator 7*; f f . Prove that the eigenvectors of T corresponding to A are the 
nonzero vectors in the kernel of V _ T. 

(a) Prove that if A and B are similar matrices, then and are also similar. More generally, prove that and 
are similar if k is any positive integer. 

(b) If and are similar, must A and B be similar? Explain. 

17. Let C and Z) be ^ x « matrices, and let 5 = (vi, V2, - -^ v,,} be a basis for a vector space V. Show that if 
C[x] 5 = D[x] 2 for all x in y, then C = £)• 

18. Find two nonzero 2x2 matrices that are not similar, and explain why they are not. 

19. Complete the proof below by justifying each step. 

Hypothesis: A and B are similar matrices. 

Conclusion: A and B have the same characteristic polynomial. 



Proof: 

1. det(A/-5) = det(A/--P"^^j 



4, 



5 



2, 



3 




6. =det(A/-^) 

20. If A and B are similar matrices, say B = P ~^ AP? then it follows from Exercise 19 that A and B have the same 
eigenvalues. Suppose that ,\ is one of the common eigenvalues and x is a corresponding eigenvector of A. See if you can 
fmd an eigenvector of B corresponding to A (expressed in terms of A, x, and P). 

21. Since the standard basis for /J " is so simple, why would one want to represent a linear operator onR^ 'm another basis? 
Answer: 

The choice of an appropriate basis can yield a better understanding of the linear operator. 

22. Prove that trace is a similarity invariant. 

True-False Exercises 

In parts (a) — (h) determine whether the statement is true or false, and justify your answer. 

(a) A matrix cannot be similar to itself 
Answer: 

False 

(b) If A is similar to 5, and B is similar to C, then A is similar to C. 
Answer: 

True 

(c) If A and B are similar and B is singular, then A is singular. 
Answer: 

True 

(d) If A and B are invertible and similar, then J[ ~^ and 5 ~^ are similar. 
Answer: 

True 

(e) If Ti : i?" — > i?" and T2 ' R^—* are linear operators, and if [ T'l ] b\B ~ [ -^2 ] with respect to two bases B and 5' 
for then T\ (x) = T2 (x) for every vector x'mR^ . 



(f) If — 5^ iJ" is a linear operator, and if [T'l ] 5 = [T'l ] with respect to two bases B and 5' for then B = B'- 
Answer: 



Answer: 



True 



False 

(g) If T: J?" is a linear operator, and if [7^ 5 = with respect to some basis B for then T is the identity operator 

Answer: 

True 

(h) If 7*: jR" — ► is a linear operator, and if [T] £ = lyi with respect to two bases B and fl' for then T is the identity 
operator on R^. 

Answer: 

False 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



Chapter 8 Supplementary Exercises 



1. Let A be an ^ X « matrix, B a nonzero ^ x « matrix, and x a vector in /?" expressed in matrix notation. Is 
^(x) = Ax, + 5 a linear operator on /j"? Justify your answer. 

Answer: 

No. 7(xi + X2) = ^(xi +X2)+B^ (Axi +B) + (Ax2 + B) = T(xi) + r(x2), and if c ^ 1, then 
T(cx) =cAx + B:^c(Ax + B)= cT(x) . 



2. Let 



(a) Show that 



A = 



cos 9 — sinfll 
sin 0 cos 0 



cos20 — sin20 
sin20 cos 20 



and 



■I 



cos30 — sin30 
sin30 cos39 



(b) Based on your answer to part (a), make a guess at the form of the matrix ^4" for any positive integer n. 

(c) By considering the geometric effect of muMplication by A, obtain the resuh in part (b) geometrically. 

3. Let X\V —^V^^ defined by ^(v) = ||v||v. Show that T is not a linear operator on V. 

4. Let VI, V2, be fixed vectors in R^, and let T.R^ — ► be the function defined by 
^(x) = (x - vi, X - V2, X - v^), where x - Vj is the Euclidean inner product on 

(a) Show that T is a linear transformation. 

(b) Show that the matrix with row vectors , V2, - . ., is the standard matrix for T. 

5. Let {ei, 62, 63, 64} be the standard basis for /J*^, and let T.R^ — ► E? be the linear transformation for 
which 

7(61) = (1,2,1), 7(62) = (0, 1, 0), 
T(e3) = (1, 3. 0). 7(84) = (1,1.1) 

(a) Find bases for the range and kernel of T. 

(b) Find the rank and nullity of T. 

Answer: 



(a) 7'(e3) and any two of ^(ei), 7^(62), and 7'(e4) form bases for the range; ( — 1,1,0, l)isa basis 
for the kernel. 

(b) Rank = 3, nullity = 1 

6. Suppose that vectors in p} are denoted by 1 x 3 matrices, and define T\R^ — ► by 

-1 2 4" 

X2 :^3]) = [^1 ^2 ^2]\ 3 0 1 

2 2 5 



(a) Find a basis for the kernel of T. 

(b) Find a basis for the range of T. 

7. Let B = {vi, V2, V3, V4} be a basis for a vector space V, and let X:V ^V^^ the linear operator for 
which 



(a) Find the rank and nullity of T. 

(b) Determine whether T is one-to-one. 

Answer: 

(a) Rank(0 = 2 and nullity(0 = 2 

(b) T is not one-to-one. 

8. Let V and W be vector spaces, let r, Ti , and T2 be linear transformations from V to W, and let ^ be a scalar. 
Define new transformations, Ti + T2 and kT-> by the formulas 



(a) Show that (7*1 + T2) :V—^W and kT: P'' — ► IF are both linear transformations. 

(b) Show that the set of all linear transformations from V to with the operations in part (a) is a vector 



9. Let A and B be similar matrices. Prove: 

(a) ji^ and are similar. 

(b) If A and B are invertible, then and are similar. 

10. Fredholm Alternative Theorem Let 7'; _ ^ be a linear operator on an n-dimensional vector space. 
Prove that exactly one of the following statements holds: 

(i) The equation ^(x) = b has a solution for all vectors b in V. 

in) Nullity of 7 > Q. 

11. Let T: M22 M22 be the linear operator defined by 



VI +V2 + V3 + 3V4 
VI — V2 + 2v3 + 2v4 
2vi — 4v2 + 5v3 + 3v4 
— 2vi + 6v2 — 6v3 — 2v4 



(7'i4T2)(x) = Ti(x) + 72(x) 
ikT)(,x)=k(T(x)) 



Space. 




Find the rank and nullity of T. 



Answer: 



Rank = 3, niilli1y=l 

12. Prove: If A and B are similar matrices, and if B and C are also similar matrices, then A and C are similar 
matrices. 



i'' ""i 

Let L : Af 22 — * ^22 ^^e linear operator that is defined by ^ l^^^ \ — . Find the matrix for L with 
respect to the standard basis for M22- 

Answer: 

10 0 0 
0 0 10 
0 10 0 
0 0 0 1 

14. Let B = {ui, U2, us) and B' = ^vi, V2, be bases for a vector space V, and let 

"2 -1 3" 



1 4 
1 2 



be the transition matrix from B' to B. 

(a) Express vi, V2, V3 as linear combinations of ni, ii2, 13. 

(b) Express uj, 02, 13 as linear combinations of vi, V2, ^3- 

15. Let B = (ui, U2, U3} be a basis for a vector space V, and let J; f _> f be a linear operator for which 



[T]b= 



-3 4 7 
1 0 -2 
0 1 0 



Find [7^ £f, where 5' = ^Yi, V2, ¥3^ is the basis for V defined by 

vi=ui, V2 = ui+U2, V3 = ui+U2 + U3 



Answer: 



[T]b' = 

16. Show that the matrices 



4 0 9 
1 0 -2 
0 1 1 



are similar but that 



3 1 

-6 -2 



and 



aiid 



2 1 
1 3 

-1 2 
1 0 



are not. 

17. Suppose that 7*: f — » f is a linear operator, and 5 is a basis for V for which 





'xi —X2 + X2' 










if Wb = 






XI-X2 




^3 



Find[7^B. 



Answer: 



[T]b = 



1 -1 1 

0 1 0 

1 0 -1 



18. Let 7": f — ► f'' be a linear operator. Prove that T is one-to-one if and only if det(^) # 0. 

19. (Calculus required) 

(a) 

Show that if F = / (x) is twice differentiable, then the function D:C^^^ oo, oo J — ► || — oo, cx) J 

defined by ^ (f ) / "(^) is a Hnear transformation. 

(b) Find a basis for the kernel of D. 

(c) Show that the set of functions satisfying the equation D(f ) = / {x) is a two-dimensional subspace of 

^ — cx), cx) and find a basis for this subspace. 



Answer: 



(b) /(^)=^. g(^) = l 

(c) f(x)=e\ g(x) = e-^ 

20. Let T:P2 — ► ^'^^ function defined by the formula 



P(0) 



(a) Find7'(jr^ + 5j: + 6j. 

(b) Show that r is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Findr-^(0.3,0). 

(e) Sketch the graph of the polynomial in part (d). 

21. Let xi^ X2, and be distinct real numbers such that 

x\ ' ^2 <X2 

and let T\P2—^R^^^ function defined by the formula 



(a) Show that r is a linear transformation. 

(b) Show that T is one-to-one. 



(c) Verify that if , 32, and 03 are any real numbers, then 



.-1 



where 





-^2)(^ - 


-X3) 




-^2)(^1 


-X2) 




-xi)(x- 


-X3) 




-Xl)(X2 


-X3) 








(^3 


-xi)(.X2 


-X2) 



(d) What relationship exists between the graph of the function 

and the points (xi.ai), (X2, ^2)^ (^3» <^3)? 
Answer: 

(b) The points are on the graph. 

22. (Calculus required) Let and [^(x) be continuous functions, and let V be the subspace of 
C( — 00, + 00) consisting of all twice differentiable functions. Define ; Y — ♦ by 

L(^ix))=y"ix)^pix)y'(x)^qix)yix) 

(a) Show that L is a linear transformation. 

(b) Consider the special case where p{x) = 0 and q{x) = 1. Show that the function 

^{x) = c 1 sin j: + <:2C0S x 

is in the kernel of L for all real values of and C2. 

23. Calculus required Let — > P„ be the differentiation operator ^(p) = P^ Show that the matrix for D 
relative to the basis B = IS 

0 10 0 ... 0 
0 0 2 0 ... 0 
0 0 0 3 ... 0 

■ ■ ■ ■ ■ 

: : ! ! : 

0 0 0 0 ... « 
0 0 0 0 ... 0 

24. Calculus required It can be shown that for any real number c, the vectors 



1, x-c. 



21 



form a basis for P„. Find the matrix for the differentiation operator of Exercise 23 with respect to this 
basis. 

25. Calculus required J:P„—*P„+i be the integration transformation defined by 



= aox + 



2 « + 1 



where p = ao + aiz + ... + ctnX*^- Find the matrix for J with respect to the standard bases for P„ 



Answer: 



0 0 0 

1 0 0 

0 1 0 

0 0 4 



■ ■ ■ 



0 
0 

0 
0 



0 0 0 • • 
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This chapter is concerned with "numerical methods" of linear algebra, an area of study 
that encompasses techniques for solving large-scale linear systems and for finding 
numerical approximations of various kinds. It is not our objective to discuss algorithms 
and technical issues in fine detail, since there are many excellent books on the subject. 
Rather, we will be concerned with introducing some of the basic ideas and exploring 
important contemporary applications that rely heavily on numerical ideas — singular value 
decomposition and data compression. A computing utility such as MATLAB, 
Mathematica, or Maple is recommended for Section 9.2 to Section 9.6 . 



INTRODUCTION 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



9.1 Z_L/-Decompositions 



Up to now, we have focused on two methods for solving linear systems, Gaussian elimination (reduction to row 
echelon form) and Gauss-Jordan elimination (reduction to reduced row echelon form). While these methods are 
fine for the small-scale problems in this text, they are not suitable for large-scale problems in which computer 
roundoff error, memory usage, and speed are concerns. In this section we will discuss a method for solving a linear 
system of n equations in n unknowns that is based on factoring its coefficient matrix into a product of lower and 
upper triangular matrices. This method, called "Z [/-decomposition," is the basis for many computer algorithms in 
common use. 



Solving Linear Systems by Factoring 



Our first goal in this section is to show how to solve a linear system ^ = ]yofn equations in n unknowns by 
factoring the coefficient matrix A into a product 

A = LU (1) 

where L is lower triangular and U is upper triangular. Once we understand how to do this, we will discuss how to 
obtain the factorization itself. 

Assuming that we have somehow obtained the factorization in 1, the linear system ^ = b can be solved by the 
following procedure, cdXltd LU-decomposition. 

n 



r 



The Method of Lty-Decomposition 

Step 1. Rewrite the system = b 



Step 2. Define a new ^ x 1 matrix y by 



Z;C/x = b (2) 



C/x = y (3) 



Step 3. Use 3 to rewrite 2 as iy = b and solve this system for 3;. 
Step 4. Substitute y in 3 and solve for x- 

L J 

This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system = b by a pair of linear 
systems 

U-x. = y 
Ly = h 

that must be solved in succession. However, since each of these systems has a triangular coefficient matrix, it 
generally turns out to involve no more computation to solve the two systems than to solve the original system 



directly. 




EXAMPLE 1 Solving >4x = b by /.L/-Decomposition M 

Later in this section we will derive the factorization 
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(4) 



Use this result to solve the linear system 
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b 



From 4 we can rewrite this system as 
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b 



(5) 



Historical Note In 1979 an important library of machine-independent linear algebra 
programs called LINPACK was developed at Argonne National Laboratories. Many of the 
programs in that library use the decomposition methods that we will study in this section. 
Variations of the LINPACK routines are used in many computer programs, including 
MATLAB, Mathematica, and Maple. 



As specified in Step 2 above, let us define y i, y2, and y3 by the equation 



"i 3 r 








>l" 


0 1 3 










0 0 1 








73 


U 


X 




y 



which allows us to rewrite 5 as 



2 0 0' 




'y\' 




"2" 


-3 1 0 








2 


4-3 7 








3 


L 


y 




b 



or equivalently as 

271 =2 
-371+72 =2 
^71 -372 + 773 = 3 

This system can be solved by a procedure that is similar to back substitution, except that we solve the 
equations from the top down instead of from the bottom up. This procedure, called forward 
substitution, yields 

yi = i, 72 = 5, 73 = 2 

(verify). As indicated in Step 4 above, we substitute these values into 6, which yields the linear 
system 



"1 3 r 








'\ 


0 1 3 




^2 




5 


0 0 1 




^3 




2 



or, equivalently, 

X2 + 37:3 = 5 
7:3 = 2 

Solving this system by back substitution yields 

7:1 =2, X2= - 1, ;f3 = 2 

(verify). 




Alan Mathison Turing (1912-1954) 

Historical Note Although the ideas were known earlier, credit for popularizing the matrix 
formulation of the LtZ-decomposition is often given to the British mathematician Alan 
Turing for his work on the subject in 1948. Turing, one of the great geniuses of the twentieth 
century, is the founder of the field of artificial intelligence. Among his many 
accomplishments in that field, he developed the concept of an internally programmed 
computer before the practical technology had reached the point where the construction of 



such a machine was possible. During World War II Turing was secretly recruited by the 
British government's Code and Cypher School at Bletchley Park to help break the Nazi 
Enigma codes; it was Turing's statistical approach that provided the breakthrough. In addition 
to being a brilliant mathematician, Turing was a world-class runner who competed 
successfully with Olympic-level competition. Sadly, Turing, a homosexual, was tried and 
convicted of "gross indecency" in 1952, in violation of the then-existing British statutes. 
Depressed, he committed suicide at age 41 by eating an apple laced with cyanide. 
[Image: Time & Life Pictures/Getty Images, Inc.] 



Finding LU-Decompositions 

Example 1 makes it clear that after A is factored into lower and upper triangular matrices, the system Ax = h can 
be solved by one forward substitution and one back substitution. We will now show how to obtain such 
factorizations. We begin with some terminology. 

n 



DEFINITION 1 

A factorization of a square matrix ^ as ^ = /, ^ , where L is lower triangular and U is upper triangular is 
called an LU-decomposition (or LU-factorization) of A. 



Not every square matrix has an Z [/-decomposition. However, we will see that if it is possible to reduce a square 
matrix A to row echelon form by Gaussian elimination without performing any row interchanges, then A will have 
an L [/-decomposition, though it may not be unique. To see why this is so, assume that A has been reduced to a row 
echelon form U using a sequence of row operations that does not include row interchanges. We know from 
Theorem 1.5.1 that these operations can be accomplished by multiplying A on the left by an appropriate sequence 
of elementary matrices; that is, there exist elementary matrices E\, E2, ---,5^ such that 

E},' ' 'E2BiA=U (8) 

Since elementary matrices are invertible, we can solve 8 for A as 

or more briefly as 

A = LU (9) 

where 



(10) 



We now have all of the ingredients to prove the following result. 



THEOREM 9.1.1 



If ^ is a square matrix that can be reduced to a row echelon form Uhy Gaussian elimination without row 
interchanges, then A can be factored as ^ = iC/, where L is a lower triangular matrix. 



Proof Let L and [/be the matrices in Formulas 10 and 8, respectively. The matrix U is upper triangular because it 
is a row echelon form of a square matrix (so all entries below its main diagonal are zero). To prove that L is lower 
triangular, it suffices to prove that each factor on the right side of 10 is lower triangular, since Theorem 1.7. IZ? will 
then imply that L itself is lower triangular. Since row interchanges are excluded, each Ej results either by adding a 
scalar multiple of one row of an identity matrix to a row below or by multiplying one row of an identity matrix by a 
nonzero scalar. In either case, the resulting matrix Ej is lower triangular and hence so is E~^ by Theorem 1.7. W. 

This completes the proof 



EXAMPLE 2 An /.(7-Decomposition M 



Find an Z [/-decomposition of 



A = 



2 I 
-3 

4 : 



6 2 

8 0 

9 2 



Solution To obtain an Z [/-decomposition, A = LU^ we will reduce A to a row echelon form U using Gausj 
elimination and then calculate L from 10. The steps are as follows: 



Rciluctiuii to 
Run ['a'IicImii riirm 



Run OperatiMi 



Elementary Matrix 
Corrc^poncliiiti tu 
the Row ()(K*ratioii 



lii\crsc of the 
[Cleiiieiitan Matrix 



Step 1 



Step 2 



Step 4 



Step 5 
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(3 X row 1) + row 2 £2 ' 
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Step 3 


(-4 X row i) + row 3 £3 = 
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(3 X row 2) + row 3 £4 = 
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^ X row 3 
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= u 



from 10, 
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2 0 0 
-3 1 0 
4-3 7 



so 



is an L {/-decomposition of A. 
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Bookkeeping 



As Example 2 shows, most of the work in constructing an /.[/-decomposition is expended in calculating L. 
However, all this work can be eliminated by some careful bookkeeping of the operations used to reduce A to U. 

Because we are assuming that no row interchanges are required to reduce A to U, there are only two types of 
operations involved — multiplying a row by a nonzero constant, and adding a scalar multiple of one row to another. 
The first operation is used to introduce the leading I's and the second to introduce zeros below the leading I's. 

In Example 2, a multiplier of was needed in Step 1 to introduce a leading 1 in the first row, and a multiplier of y 

was needed in Step 5 to introduce a leading 1 in the third row. No actual multiplier was required to introduce a 
leading 1 in the second row because it was already a 1 at the end of Step 2, but for convenience let us say that the 
multiplier was 1 . Comparing these multipliers with the successive diagonal entries of Z, we see that these diagonal 
entries are precisely the reciprocals of the multipliers used to construct U\ 

2 0 0" 

i= -3 10 (11) 
4-3 7 

Also observe in Example 2 that to introduce zeros below the leading 1 in the first row, we used the operations 

add 3 times the first row to the second 

add— 4 times the first row to the third 
and to introduce the zero below the leading 1 in the second row, we used the operation 

add 3 tunes the second rovv^ to the tlwd 
Now note in 12 that in each position below the main diagonal of L, the entry is the negative of the multiplier in the 
operation that introduced the zero in that position in U\ 

2 0 0 

i= -3 10 (12) 
4-3 7 



This suggests the following procedure for constructing an Z [/-decomposition of a square matrix A, assuming that 
this matrix can be reduced to row echelon form without row interchanges. 

n 



Procedure for Constructing an L(y-Decomposition 

Step 1. Reduce ^ to a row echelon form Uhy Gaussian elimination without row interchanges, keeping 
track of the multipliers used to introduce the leading I's and the multipliers used to introduce the 
zeros below the leading I's. 

Step 2. In each position along the main diagonal of Z, place the reciprocal of the multiplier that introduced 
the leading 1 in that position in U. 

Step 3. In each position below the main diagonal of Z, place the negative of the multiplier used to 
introduce the zero in that position in U. 

Step 4. Form the decomposition A = LU' 



EXAMPLE 3 Constructing an /.(V-Decomposition M 



Find an L^-decomposition of 



6-2 0 

9 -1 1 
3 7 5 



Solution We will reduce A to a. row echelon form U and at each step we will fill in an entry of L in 
accordance with the four-step procedure above. 
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Thus, we have constructed the Z, [/-decomposition 
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A = LU = 
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1 in llic third nw. 



We leave it for you to confirm this end result by multiplying the factors. 



LU-Decompositions Are Not Unique 

In the absence of restrictions, Zt/-decompositions are not unique. For example, if 
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A = LU = 
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and L has nonzero diagonal entries, then we can shift the diagonal entries from the left factor to the right factor by 
writing 



A = 



0 
1 



hi flu 



1 

^2l/'fll 



0 

1 



which is another LJy-decomposition of A. 



In 
0 

0 

h\ 

0 
0 
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LDU-Decompositions 



The method we have described for computing Z,i7-decompositions may resuh in an "asymmetry" in that the matrix 
Uhas I's on the main diagonal but L need not. However, if it is preferred to have I's on the main diagonal of the 
lower triangular factor, then we can "shift" the diagonal entries of Z, to a diagonal matrix D and write L as 

L = L'D 

where Z, ' is a lower triangular matrix with I's on the main diagonal. For example, a general 3x3 lower triangular 
matrix with nonzero entries on the main diagonal can be factored as 
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Note that the columns of L ' are obtained by dividing each entry in the corresponding column of L by the diagonal 
entry in the column. Thus, for example, we can rewrite 4 as 
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One can prove that if ^ is a square matrix that can be reduced to row echelon form without row interchanges, then 
A can be factored uniquely as 

A = LDU 

where L is a lower triangular matrix with I's on the main diagonal, £) is a diagonal matrix, and U is an upper 
triangular matrix with I's on the main diagonal. This is called the LDU-decomposition {ov LDU-factorization) of 
A. 



PL U-Decompositions 



Many computer algorithms for solving linear systems perform row interchanges to reduce roundoff error, in which 



case the existence of an ZtZ-decomposition is not guaranteed. However, it is possible to work around this problem 
by "preprocessing" the coefficient matrix A so that the row interchanges are performed prior to computing the 
Zt/-decomposition itself More specifically, the idea is to create a matrix Q (called a permutation matrix) by 
multiplying, in sequence, those elementary matrices that produce the row interchanges and then execute them by 
computing the product QA. This product can then be reduced to row echelon form without row interchanges, so it is 
assured to have an LtZ-decomposition 

QA = LU (13) 

Because the matrix Q is invertible (being a product of elementary matrices), the systems ^ = b and QAx: = Qh 
will have the same solutions. But it follows from 13 that the latter system can be rewritten as LUx = Qh and hence 
can be solved using Zt/-decomposition. 

It is common to see Equation 13 expressed as 

A = PLU (14) 
in which P=Q~^. This is called a PLU-decomposition or (PLU-factorization) of A. 



Concept Review 

• Zt/-decomposition 

• ZZ)t/-decomposition 

• PZt/-decomposition 



Skills 

• Determine whether a square matrix has an ZtZ-decomposition. 

• Find an Lt/-decomposition of a square matrix. 

• Use the method of Z [/-decomposition to solve linear systems. 

• Find the ZD [/-decomposition of a square matrix. 

• Find a PL [/-decomposition of a square matrix. 



Exercise Set 9.1 

1. Use the method of Example 1 and the Z [/-decomposition 
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to solve the system 

3x1 — 6x2 =0 
—2x1 + 5x2 =1 



Answer: 



XI = 2^ X2-\ 
2. Use the method of Example 1 and the L ^-decomposition 
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to solve the system 

3^:1—67:2 — 3x3= —3 
2x1 +6x3= -22 
— 4x1+7x2 + 4x3= 3 

In Exercises 3-10, find an /.[/-decomposition of the coefficient matrix, and then use the method of Example 
solve the system. 



Answer: 

11 = 3, X2 = - I 
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Answer: 



11= -1, X2 = l, Jr3 = 0 
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Answer: 



^1 = 
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X2 = 1, 
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3 




0 


0 


1 5 


X4 






7 



Answer: 



= - 3, 7:2 = 1, X2 = 2, X4=\ 



10. 


2 


-4 


0 


0 


"^1" 




8 




1 


2 


12 


0 






0 




0 


-1 


-4 


-5 


^3 




1 




0 


0 


2 


11 


X4 




0 



11. Let 



A = 



2 
-2 
2 



1 -1 

-1 2 
1 0 



(a) Find an ZiZ-decomposition of ^. 

(b) Express A in the form A = L\DU\, where is lower triangular with I's along the main diagonal, U\ is 
upper triangular, and D is a diagonal matrix. 

(c) Express A in the form A = L2U2, where L2 is lower triangular with I's along the main diagonal and U2 is 
upper triangular. 

Answer: 



(a) 



A = LU = 



2 0 0 
-2 1 0 
2 0 1 



'I 

0 0 
0 0 



(b) 





1 


0 


0 


2 


0 


0 


A = LiDUi = 


-1 


1 


0 


0 


1 


0 




1 


0 


1 


0 


0 


1 



1 


1 


2 


~2 


0 


1 


0 


1 



1 

0 
0 



(c) 


1 


0 


0" 


2 


1 


-1 


A = L2U2 = 


-1 


1 


0 


0 


0 


1 




1 


0 


1 


0 


0 


1 



In Exercises 12-13, find anZZ)i7-decomposition of A 



13. 


'3 


-12 


6 


A = 


0 


2 


0 




6 


-28 


13 



Answer: 





"1 


0 


0" 


"3 


0 


0" 


"1 


-4 


2" 


A = 


0 


1 


0 


0 


2 


0 


0 


1 


0 




2 


-2 


1 


0 


0 


1 


0 


0 


1 



14. 



(a) Show that the matrix 



has no LtZ-decomposition. 
(b) Find a PZ(7-decomposition of this matrix. 



In Exercises 15-16, use the given PZi7-decomposition of A to solve the linear system i4x = b by rewriting it as 
P~^Ax. = P~^h ^iid solving this system by Zi7-decomposition. 



15. 





2 




0 


1 4 










b = 


1 




1 


2 2 


9 










5 




3 


1 3 












'0 


1 0" 


'1 


0 0" 


'1 


2 


2 


A = 


1 


0 0 


0 


1 0 


0 


1 


4 




0 


0 1 


3 


-5 


1 


0 


0 


17 



= PLU 



Answer: 



21 



14 



12 



17' ""2- - 17' '3- 17 



16. 


3 




4 


1 2 










i = 


0 


. A = 


0 


2 1 












_6_ 




8 


1 8 












"1 


0 0' 


'1 


0 0' 


"4 


1 


2 


A = 


0 


0 1 


2 


1 0 


0 


-1 


4 




0 


1 0 


0 


-2 


1 


0 


0 


9 



In Exercises 17-18, find a PLtZ-decomposition of A, and use it to solve the linear system Ax = b by the method 
of Exercises 15 and 16. 



17. 


"3 


-1 


0" 




-2" 


.4 = 


3 


-1 


1 


; b = 


1 




0 


2 


1 




4 



Answer: 

















1 


1 


0 






"l 


0 


0' 


"3 


0 


0" 


3 






A = 


0 


0 


1 


0 


2 


0 


0 


1 


1 






0 


1 


0 


3 


0 


1 




2 


















0 


0 


1 





18. 


0 


3 


-2" 




7" 


A = 


1 


1 


4 


;b = 


5 




2 


2 


5 




-2 



"2' ''2 = i' '3 = 3 



19. Let 



(a) Prove: If 13 ^ 0? then the matrix A has a unique ZtZ-decomposition with I's along the main diagonal of L. 

(b) Find the Zt/-decomposition described in part (a). 

Answer: 



a b 




"1 


0" 


a b 


c d 




a 


1 


Q ad ^bc 
a 



20. Let Ax = b be a linear system of n equations in n unknowns, and assume that A is an invertible matrix that can 
be reduced to row-echelon form without row interchanges. How many additions and multiplications are 
required to solve the system by the method of Example 1? 

21. Prove: If A is any ny.n matrix, then A can be factored as ^ = PLU^ where L is lower triangular, U is upper 
triangular, and P can be obtained by interchanging the rows of appropriately. [Hint: Let t/be a row echelon 
form of A, and let all row interchanges required in the reduction of ^ to ^be performed first.] 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 

(a) Every square matrix has an Lt/-decomposition. 
Answer: 

False 

(b) If a square matrix A is row equivalent to an upper triangular matrix U, then A has an ZtZ-decomposition. 
Answer: 

False 

(c) If £1, L2, ...,Lff^SirQ^x?2 lower triangular matrices, then the product ' ' ' Li^_ is lower triangular. 
Answer: 

True 

(d) If a square matrix A has an Lt/-decomposition, then A has a unique Z£)t/-decomposition. 
Answer: 

True 

(e) Every square matrix has a PZtZ-decomposition. 
Answer: 

True 
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9.2 The Power Method 



The eigenvalues of a square matrix can, in theory, be found by solving the characteristic equation. However, this 
procedure has so many computational difficulties that it is almost never used in applications. In this section we will 
discuss an algorithm that can be used to approximate the eigenvalue with greatest absolute value and a corresponding 
eigenvector. This particular eigenvalue and its corresponding eigenvectors are important because they arise naturally in 
many iterative processes. The methods we will study in this section have recently been used to create Internet search 
engines such as Google. We will discuss this application in the next section. 



The Power Method 



There are many applications in which some vector xg in is multiplied repeatedly by an ^ x « matrix A to produce a 
sequence 

XQ, ^^XQ,.--, ^^XQ,.-. 

We call a sequence of this form a power sequence generated by A. In this section we will be concerned with the 
convergence of power sequences and how such sequences can be used to approximate eigenvalues and eigenvectors. 
For this purpose, we make the following definition. 

r n 



DEFINITION 1 

If the distinct eigenvalues of a matrix^ are Aj, A2, Aj^, and if |Ai | is larger than |A2|, |Ajt|, then X\ is 
called a dominant eigenvalue oiA. Any eigenvector corresponding to a dominant eigenvalue is called a 
dominant eigenvector oiA. 

L J 



EXAMPLE 1 Dominant Eigenvalues ^ 

Some matrices have dominant eigenvalues and some do not. For example, if the distinct eigenvalues of a 
matrix are 

Ai=-4, A2=-2, A3 = l, A4 = 3 

then Aj = — 4 is dominant since |Ai | = 4 is greater than the absolute values of all the other eigenvalues; 
but if the distinct eigenvalues of a matrix are 

then |Ai I = |A2| = 7, so there is no eigenvalue whose absolute value is greater than the absolute value of 
all the other eigenvalues. 



The most important theorems about convergence of power sequences apply to ^xn matrices with n linearly 
independent eigenvectors (symmetric matrices, for example), so we will limit our discussion to this case in this section. 



THEOREM 9.2.1 



Let ^ be a symmetric nxn matrix with a positive dominant eigenvalue ,\ . If xq is a unit vector in that is 
not orthogonal to the eigenspace corresponding to A, then the normahzed power sequence 

_ Acq _ Ax.\ _ Ax.]^-\ 

''O' ""i-ii^oir 11^1 II ii^fc-iii'- 

converges to a unit dominant eigenvector, and the sequence 

Ax.\'-x.\, Ax2'X2. ^3-X3, Ascj^'Xj^,... (2) 

converges to the dominant eigenvalue X. 



Remark In the exercises we will ask you to show that 1 can also be expressed as 

°' ^ ii^oir ^ p^^^oii'"""' ' \\A'.o\\'- 

This form of the power sequence expresses each iterate in terms of the starting vector xq, rather than in terms of its 
predecessor. 



We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the 2 x 2 case where ^ is a symmetric 
matrix with distinct positive eigenvalues, Xi and A2, one of which is dominant. To be specific, assume that is 
dominant and 

Ai > A2 > 0 

Since we are assuming that ^ is symmetric and has distinct eigenvalues, it follows from Theorem 7.2.2 that the 
eigenspaces corresponding to Aj and A2 are perpendicular lines through the origin. Thus, the assumption that xq is a 
unit vector that is not orthogonal to the eigenspace corresponding to A^ implies that xq does not lie in the eigenspace 
corresponding to A2 . To see the geometric effect of multiplying xq by v4, it will be useful to split xq into the sum 

xo = vo+wo (4) 

where vq and wq are the orthogonal projections of xq on the eigenspaces of A^ and A2, respectively (Figure 9.2.\d). 

A|Vq + A>Wq 




(«) (« ie) 

Figure 9.2.1 



This enables us to express Ax.^} as 

i4xo = -4vo + AiTQ = AiVQ + A2W0 (5) 



which tells us that multiplying xq by ^ "scales" the terms vq and wq in 4 by and A2, respectively. However, is 
larger than A2, so the scaling is greater in the direction of vq than in the direction of wq . Thus, multiplying xq by A 
"pulls" XQ toward the eigenspace of Aj , and normalizing produces a vector xj = AxiQ I \\Ax.^ \\ , which is on the unit 
circle and is closer to the eigenspace of A^ than xq (Figure 9.2.\b). Similarly, multiplying x^ by ^ and normalizing 
produces a unit vector X2 that is closer to the eigenspace of than xj . Thus, it seems reasonable that by repeatedly 
multiplying by A and normalizing we will produce a sequence of vectors x^ that lie on the unit circle and converge to a 
unit vector x in the eigenspace of A^ (Figure 9.2.1c). Moreover, if x;,; converges to x, then it also seems reasonable that 
Ax.]^ ' x^t will converge to 

i4x • X = Aix • X = Ai ||x|| = Ai 

which is the dominant eigenvalue of^. 



The Power Method with Euclidean Scaling 

Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue and a corresponding unit 
eigenvector of a symmetric matrix A, provided the dominant eigenvalue is positive. This algorithm, called the power 
method with Euclidean scaling, is as follows: 

r n 
The Power Method with Euclidean Scaling 

Step 1. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a unit vector xq . 

Step 2. Compute Ax.{] and normalize it to obtain the first approximation xi to a dominant unit eigenvector. 
Compute Ax.\ • xj to obtain the first approximation to the dominant eigenvalue. 

Step 3. Compute ..^x^ and normalize it to obtain the second approximation X2 to a dominant unit eigenvector. 
Compute .4x2 " ^2 obtain the second approximation to the dominant eigenvalue. 

Step 4. Compute Ax.2 and normalize it to obtain the third approximation X3 to a dominant unit eigenvector. 
Compute Ax.2 ' ^3 obtain the third approximation to the dominant eigenvalue. 

Continuing in this way will usually generate a sequence of better and better approximations to the dominant 
eigenvalue and a corresponding unit eigenvector. 



EXAMPLE 2 The Power Method with Euclidean Scaling ^ 



Apply the power method with Euclidean scaling to 

'3 2 



A = 



2 3 



with XQ = 



Stop at x_5 and compare the resulting approximations to the exact values of the dominant eigenvalue and 
eigenvector. 



Solution We will leave it for you to show that the eigenvalues of ^ are A = 1 and A = 5 and that the 
eigenspace corresponding to the dominant eigenvalue A = 5 is the line represented by the parametric 
equations xi = t,X2=t, which we can write in vector form as 



x = t 



(6) 



Setting t = \ I ^ yields the normalized dominant eigenvector 

1 



VI = 



f2 

1 



0.707106781187. 
0 707106781187. 



(7) 



Now let us see what happens when we use the power method, starting with the unit vector xq 

1 



^0 = 



^1 » 



3 2 




1 




3 




2 3 




_0_ 




2 




"0.83205' 




"3.60555 


0. 55470 _ 




3.32820 


"0.73480" 




"3.56097 


_0.67828_ 




3.50445 


"0.71274" 




"3.54108 


0.70143 




3.52976 


"0.70824" 




"3.53666 


0.70597 




3.53440 



ll^oll 



3" 


^ 1 


"3" 




"0.83205' 


2 


3.60555 


2 




0.55470 



X3 = 



X5 = 



^1 




1 


11^1 II 




4.90682 


-4X2 




1 


11^211 




4.99616 


^3 




1 


11^311 




4.99985 


.4X4 




1 


11^411 




4.99999 



3.60555 
3.32820 

3.56097 
3.50445 

3.54108 
3.52976 

3.53666 
3.53440 



0.73480 
0.67828 

0.71274 
0.70143 

0.70824 
0.70597 

0.70733 
0.70688 



•XI = (^l)^xi « [3.60555 3.32820] 

■X2 = (^2)^X2 fti [3.56097 3.50445] 

■X3 = (^3)^x3 PS [3.54108 3.52976] 

• X4 = (^4) ^X4 R3 [ 3. 53666 3. 53440 ] 

■ X5 = (^5) ^X5 5« [3.53576 3.53531] 



0.83205 
0.55470 

0.73480 
0.67828 

0.71274 
0.70143 

0.70824 
0.70597 

0.70733 
0.70688 



PS4.84615 
S34.99361 
« 4.99974 
W4.99999 
« 5.00000 



Thus, approximates the dominant eigenvalue to five decimal place accuracy and X5 approximates the 
dominant eigenvector in 7 correctly to three decimal place accuracy. 



It is accidental that ,\C^ (the fifth approximation) 
produced five decimal place accuracy. In general, n 
iterations need not produce n decimal place 
accuracy. 



The Power Method with Maximum Entry Scaling 

There is a variation of the power method in which the iterates, rather than being normalized at each stage, are scaled to 
make the maximum entry 1 . To describe this method, it will be convenient to denote the maximum absolute value of the 
entries in a vector x by max(x) . Thus, for example, if 



(9) 



then tnax(x) = 7. We will need the following variation of Theorem 9.2.1. 

c 

THEOREM 9.2.2 

Let ^ be a symmetric n x n matrix with a positive dominant* eigenvalue A . If xq is a nonzero vector in that 
is not orthogonal to the eigenspace corresponding to A, then the sequence 

xn, XI = — — , X2 = ^\ — ^, xt = — ^, (8) 

^ ^ max(^o) max(^i) ^ max(^jt-l) 

converges to an eigenvector corresponding to and the sequence 

^1 • XI ^4x2 • X2 ^3 • X3 Ax.k ' Xfe 

XI -xi ' X2 •X2 ' X3 •X3 ^k'^k 

converges to X. 

r r 
Remark In the exercises we will ask you to show that 8 can be written in the alternative form 

which expresses the iterates in terms of the initial vector xq . 

We will omit the proof of this theorem, but if we accept that 8 converges to an eigenvector of A, then it is not hard to see 
why 9 converges to the dominant eigenvalue. For this purpose we note that each term in 9 is of the form 

^•x ^^^^ 



X • X 

which is called a Rayleigh quotient of ^. In the case where X is an eigenvalue of A and x is a corresponding eigenvector, 
the Rayleigh quotient is 

A ■ X ^ Ax - X _ A(x - x) 
X • X X • X X • X 

Thus, if x/t converges to a dominant eigenvector x, then it seems reasonable that 

^^'^^ converges to =A 
x^ • x^ ^ X • X 

which is the dominant eigenvalue. 

Theorem 9.2.2 produces the following algorithm, called the power method with maximum entry scaling. 



The Power Method with Maximum Entry Scaling 



step 1. Choose an arbitrary nonzero vector xq . 

Step 2. Compute multiply it by the factor 1 / max(j4xo) to obtain the first approximation to a 

dominant eigenvector. Compute the Rayleigh quotient of to obtain the first approximation to the 
dominant eigenvalue. 

Step 3. Compute Ax^i and scale it by the factor 1 / max(^i) to obtain the second approximation X2 to a 
dominant eigenvector. Compute the Rayleigh quotient of X2 to obtain the second approximation to the 
dominant eigenvalue. 

Step 4. Compute .-4x2 and scale it by the factor 1 / max (^2) to obtain the third approximation X3 to a 
dominant eigenvector. Compute the Rayleigh quotient of X3 to obtain the third approximation to the 
dominant eigenvalue. 

Continuing in this way will generate a sequence of better and better approximations to the dominant 
eigenvalue and a corresponding eigenvector. 




John William Strutt Rayleigh (1842-1919) 



Historical Note The British mathematical physicist John Rayleigh won the Nobel prize in physics in 1 904 for 
his discovery of the inert gas argon. Rayleigh also made fundamental discoveries in acoustics and optics, and 
his work in wave phenomena enabled him to give the first accurate explanation of why the sky is blue. 
[Image: The Granger Collection, New York] 



EXAMPLE 3 Example 2 Revisited Using IVIaximum Entry Scaling M 



Apply the power method with maximum entry scaling to 

■3 2 



A = 



2 3 



with XQ = 



Stop at X;5 and compare the resulting approximations to the exact values and to the approximations 
obtained in Example 2. 



Solution We leave it for you to confirm that 



^0 = 



"3 2" 




"l" 




"3" 


2 3_ 




0 




2_ 



^1 » 
-4x2 « 

-4x4 



1. 00000 
0.66667 



][:: 



.00000 
0.92308 



1.00000 
0.98413 

1.00000 
0.99681 

A®. 

a(^. 



XI = 



,4xn 



max(j4xo) 



"3" 




"1. 00000' 


_2_ 




_0.66667_ 



4.33333 
4.00000 

4.84615 
4.76923 

4.96825 
4.95238 

4.99361 

4.99042 

^1 -xi _ (Aci)^xi 





^4x1 


1 


"4.33333' 




"l.OOOOO" 




max(Ax:i) 


~ 4.33333 


_4.00000_ 




_0.92308_ 




-4X2 








1 nnnnn' 


X3 = 


max (-4x2) 


~ 4.84615 


_4.76923_ 




_0.98413_ 


X4 = 


-4X3 




'4.96825' 




"l.OOOOO' 


max (-4x3) 


4.96825 


4.95238_ 




_0.99681_ 


X5 = 


-4X4 


1 


"4.99361' 




'l.OOOOO' 


max (-4x4) 


~ 4.99361 


4.99042 




0.99936 



7.00000 
1.44444 



AX7-X7 _ (^2) .. 9.24852 

^^2-^2 ~ 1.85207 

^^•x^ _ (^3)^x3 9.84203 

^3-^3 ~ 1.96851 



!4.84615 



!4.99361 



Rs 4. 99974 



AC4-X4 (^4) 9.96808 - . ^ r^r^r^r^r, 

X4-X4 - T ~ 1.99362 -^-^^^^^^ 
■X-4 -^4 

Ac5'X5 ^ (^5)^x5 _ 9.99360 

^^•^^ xfx5 ^199872 



: 5.00000 



Thus, \(^) approximates the dominant eigenvalue correctly to five decimal places and X;^ closely 
approximates the dominant eigenvector 



x = 



1 



that results by taking 1= | in 6. 



Whereas the power method with Euclidean scaling 
produces a sequence that approaches a unit 
dominant eigenvector, maximum entry scaling 
produces a sequence that approaches an eigenvector 
whose largest component is 1 . 



Rate of Convergence 

If ^ is a symmetric matrix whose distinct eigenvalues can be arranged so that 

|Al|>|A2|>|A3|>...>|A,| 

then the "rate" at which the Rayleigh quotients converge to the dominant eigenvalue depends on the ratio |Ai | / |A2|; 
that is, the convergence is slow when this ratio is near 1 and rapid when it is large — the greater the ratio, the more rapid 
the convergence. For example, if ^ is a 2 x 2 symmetric matrix, then the greater the ratio |Ai | / |A2|, the greater the 



disparity between the scaling effects of and A2 in Figure 9.2.1, and hence the greater the effect that multiplication by 
A has on pulling the iterates toward the eigenspace of ,\i . Indeed, the rapid convergence in Example 3 is due to the fact 
that |Ai I / |A2| = J / 1 = 5, which is considered to be a large ratio. In cases where the ratio is close to 1, the 
convergence of the power method may be so slow that other methods must be used. 



Stopping Procedures 

If X is the exact value of the dominant eigenvalue, and if a power method produces the approximation X^K) at the kth 
iteration, then we call 



(12) 



the relative error in If this is expressed as a percentage, then it is called the percentage error in aC^). For 
example, if A = 5 and the approximation after three iterations is A® = 5. 1, then 



relative error in 



5-5 1 



= |-0.02| = 0.02 



percentage error in A*^ = 0.02 x 100% = 2% 



In applications one usually knows the relative error E that can be tolerated in the dominant eigenvalue, so the goal is to 
stop computing iterates once the relative error in the approximation to that eigenvalue is less than E. However, there is a 
problem in computing the relative error from 12 in that the eigenvalue X is unknown. To circumvent this problem, it is 
usual to estimate X by ,\C^) and stop the computations when 



a(*)_a(*-i) 



a(^) 



(13) 



The quantity on the left side of 13 is called the estimated relative error in A*^*^'' and its percentage form is called the 
estimated percentage error in \C*). 

EXAMPLE 4 Estimated Relative Error < 



For the computations in Example 3, find the smallest value of A: for which the estimated percentage error 
in A*^*^ is less than 0.1%. 

Solution The estimated percentage errors in the approximations in Example 3 are as follows: 



APPROXIMATION 



A(^: 
aC^: 
a(^: 
A(^: 



a®-a(i) 




4.99361 -4.84615 


A© 




4.99361 


A®_a(2) 




4.99974-4.99361 


aC3) 




4.99974 


A^^-A® 




4.99999-4.99974 


a(4) 




4.99999 






5.00000-4.99999 


a(^ 




5.00000 



RELATIVE PERCENTAGE 
ERROR ERROR 

w 0.02953 = 2.953% 
« 0.00123 = 0.123% 
« 0.00005 = 0.005% 
PS 0.00000 = 0% 



Thus, = 4.99999 is the first approximation whose estimated percentage error is less than 0.1%. 



Remark A rule for deciding when to stop an iterative process is called a stopping procedure. In the exercises, we will 
discuss stopping procedures for the power method that are based on the dominant eigenvector rather than the dominant 
eigenvalue. 



Concept Review 

• Power sequence 

• Dominant eigenvalue 

• Dominant eigenvector 

• Power method with Euclidean scaling 

• Rayleigh quotient 

• Power method with maximum entry scaling 

• Relative error 

• Percentage error 

• Estimated relative error 

• Estimated percentage error 

• Stopping procedure 

Skills 

• Identify the dominant eigenvalue of a matrix. 

• Use the power methods described in this section to approximate a dominant eigenvector. 

• Find the estimated relative and percentage errors associated with the power methods. 



Exercise Set 9.2 

In Exercises 1-2, the distinct eigenvalues of a matrix are given. Determine whether v4 has a dominant eigenvalue, and 
if so, fmd it. 

1- (a) Ai=7, A2 = 3, A3= -8, ^4=! 
(b) Ai= -5, A2 = 3, A3 = 2, A4 = 5 

Answer: 

(a) \3 dominant 

(b) No dominant eigenvalue 

2- (a) Ai = l, A2 = 0, A3= -3, A4 = 2 

(b) Ai= -3, A2= -2, A3= -1, A4 = 3 



In Exercises 3-4, apply the power method with Euclidean scaling to the matrix A, starting with xq and stopping at X4. 
Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding unit 
eigenvector. 



.4 

Answer 



-1 -1} ^= 



0.98058 
-0.19612 



X2» 



0.98837" 




0.98679" 


0.15206 




-0.16201 



X4 



r 0.98715] 
-0.15977 J' 



dominant eigenvalue: A = 2 + ^f\Ow 5. 16228; 



dominant eigenvector: 



3-/To 



-0.16228 J 



4. 


7 


-2 


0" 




1" 


A = 


-2 


6 


-2 


; xo = 


0 




0 


-2 


5 




0 



In Exercises 5-6, apply the power method with maximum entry scaling to the matrix A, starting with xq and stopping 
at X4. Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding 
scaled eigenvector. 



^=[-3 5} "°=[I] 



Answer: 



^i = [-'j A(l> = 6; X2 = [-J^]. aC^ = 6.6; X3 « ["^'f ^^]. A^^ « 6.60550; 
^^|-0.53488j^ aC^« 6.60555; 

dominant eigenvalue: A = 3 + /l3 w 6.60555; 



dominant eigenvector: 



V^26 + 4/l3 

2 + /T3 
^26 + 4/13 



[-0.47186] 
[ O.88I67J 



6. 


"3 


2 


2 




"1" 


A = 


2 


2 


0 


; 3=0 = 


1 




2 


0 


4 




1 



7. Let 



(a) Use the power method with maximum entry scaHng to approximate a dominant eigenvector of A. Start with xq, 
round off all computations to three decimal places, and stop after three iterations. 

(b) Use the result in part (a) and the Rayleigh quotient to approximate the dominant eigenvalue of ^. 

(c) Find the exact values of the eigenvector and eigenvalue approximated in parts (a) and (b). 

(d) Find the percentage error in the approximation of the dominant eigenvalue. 

Answer: 

(b) X^'> = 2.S. A® w 2.976, A® w 2.997 

^'^^ Dominant eigenvalue: A = 3; dominant eigenvector: ^ J j 
(d) 0.1% 



8. Repeat the directions of Exercise 7 with 





'2 


1 


0" 




r 




1 


2 


0 




1 




0 


0 


10 




1 



In Exercises 9-10, a matrix A with a dominant eigenvalue and a sequence xq, >t|Y Q jI^xq given. Use Formulas 

9 and 10 to approximate the dominant eigenvalue and a corresponding eigenvector. 



i} '^=[1} ^=[2} 



Answer: 



2.99993; 



0.99180 
00000 



a\ = 



14 
13 



A% = 



40 
41 



.5 [122] 



11. Consider matrices 



where xq is a unit vector and 3 0- Show that even though the matrix A is symmetric and has a dominant 
eigenvalue, the power sequence 1 in Theorem 9.2.1 does not converge. This shows that the requirement in that 
theorem that the dominant eigenvalue be positive is essential. 

12. Use the power method with Euclidean scaling to approximate the dominant eigenvalue and a corresponding 

eigenvector of A. Choose your own starting vector, and stop when the estimated percentage error in the eigenvalue 
approximation is less than 0.1%. 



(a) 


1 
1 


^2 
D 










A 


— i 








_| 


1 C\ 




(b) 


'l 


0 


1 


1 




0 


2 


-1 


1 




1 


-1 


4 


1 




1 


1 


1 


8 



13. Repeat Exercise 12, but this time stop when all corresponding entries in two successive eigenvector approximations 
differ by less than 0.01 in absolute value. 



Answer: 

(a) 



Starting with 



(b) 



Starting with 



it takes 8 iterations. 



it takes 8 iterations. 



14. Repeat Exercise 12 using maximum entry scaling. 

15. Prove: If ^ is a nonzero n x n matrix, then ^.4 ^^4 and ^4 ^ have positive dominant eigenvalues. 

16. (For readers familiar with proof by induction) Let ^ be an « x « niatrix, let xq be a unit vector in /J", and define 
the sequence xi, X2, xjt, ... by 

_ Acq _ A}i\ _ Axfc_i 

Prove by induction that = jl^ig / ||j1*io|| • 

17. (For readers familiar with proof by induction) Let ^ be an « x « niatrix, let xq be a nonzero vector in and 
define the sequence x\, X2, . . ., x^, ... by 

^ = ^ . . . = ^k-\ . . . 



XI = 



max(.4xfc_i) ' 



Prove by induction that 



i4*xn 



max 



(^*xo) 
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9.3 Internet Search Engines 

Early search engines on the Internet worked by examining key words and phrases in pages and titles of posted documents. Today's most popular search engines use algorithms 
based on the power method to analyze hyperlinks (references) between documents. In this section we will discuss one of the ways in which this is done. 

Google, the most widely used engine for searching the Internet, was developed in 1996 by Larry Page and Sergey Brin while both were graduate students at Stanford University. 
Google uses a procedure known as the PageRank algorithm to analyze how documents at relevant sites reference one another. It then assigns to each site a PageRank score, 
stores those scores as a matrix, and uses the components of the dominant eigenvector of that matrix to establish the relative importance of the sites to the search. 

Google starts by using a standard text-based search engine to find an initial set S'q of sites containing relevant pages. Since words can have multiple meanings, the set oq will 
typically contain irrelevant sites and miss others of relevance. To compensate for this, the set S'q is expanded to a larger set 5* by adjoining all sites referenced by the pages in the 
sites of The underlying assumption is that 5* will contain the most important sites relevant to the search. This process is then repeated a number of times to refine the search 
information still further. 

To be more specific, suppose that the search set S contains n sites, and define the adjacency matrix for S to be the « x « matrix A — [t^jj ] in which 

= 1 if site i references site J 
ajj = 0 if site ^ does not reference site J 
We will assume that no site references itself, so the diagonal entries of A will all be zero. 



EXAMPLE 1 Adjacency Matrices M 



Here is a typical adjacency matrix for a search set with four sites: 

Referenced Site 



A = 



Thus, Site 1 references Sites 3 and 4, Site 2 references Site 1 , and so forth. 



1 


2 


3 


4 




'o 


0 


1 


r 


1 


1 


0 


0 


0 


2 


1 


0 


0 


1 


3 


1 


1 


1 


0 


4 



Referencing Site 



(1) 



There are two basic roles that a site can play in the search process — the site may be a hub, meaning that it references many other sites, or it may be an authority, meaning that it 
is referenced by many other sites. A given site will typically have both hub and authority properties in that it will both reference and be referenced. 



Historical Note The term google is a variation of the word googol, which stands for the number 1 0 ^'^'^ (1 followed by 100 zeros). This term was invented by the 
American mathematician Edward Kasner (1878-1955) in 1938, and the story goes that it came about when Kasner asked his eight-year-old nephew to give a name to a 
really big number — he responded with "googol." Kasner then went on to define agoogolplex to be 10^°°^°^ (1 followed by googol zeros). 



In general, if A is an adjacency matrix for n sites, then the column sums of A measure the authority aspect of the sites and the row sums of A measure their hub aspect. For 
example, the column sums of the matrix in 1 are 3, 1,2, and 2, which means that Site 1 is referenced by three other sites. Site 2 is referenced by one other site, and so forth. 
Similarly, the row sums of the matrix in 1 are 2, 1,2, and 3, so Site 1 references two other sites. Site 2 references one other site, and so forth. 

Accordingly, if A is an adjacency matrix, then we call the vector Kq of row sums of A the initial hub vector of A, and we call the vector of column sums of A the initial 
authority vector oiA. Alternatively, we can think of as the vector of row sums of which turns out to be more convenient for computations. The entries in the hub vector 
are called hub weights and those in the authority vector authority weights. 



EXAMPLE 2 Initial Hub and Authority Vectors of an Adjacency Matrix 

Find the initial hub and authority vectors for the adjacency matrix^ in Example 1. 
Solution The row sums of A yield the initial hub vector 



ho = 



Site 1 
Site 2 
Site 3 
Site 4 



2 
1 

2 

3 

and the row sums of ^ ^ (the column sums of A) yield the initial authority vector 

3] Site 1 
^ 1 Site! 
*° 2 Sites 
2 Site 4 



(2) 



(3) 



The link counting in Example 2 suggests that Site 4 is the major hub and Site 1 is the greatest authority. However, counting links does not tell the whole story; for example, it 
seems reasonable that if Site 1 is to be considered the greatest authority, then more weight should be given to hubs that link to that site, and if Site 4 is to be considered a major 



hub, then more weight should be given to sites to which it links. Thus, there is an interaction between hubs and authorities that needs to be accounted for in the search process. 
Accordingly, once the search engine has calculated the initial authority vector ag, it then uses the information in that vector to create new hub and authority vectors and 
using the formulas 



hi= 



and ai = ^V^ 



(4) 



The numerators in these formulas do the weighting, and the normalization serves to control the size of the entries. To understand how the numerators accomplish the weighting, 
view the product Aa.Q as a linear combination of the column vectors of ^ with coefficients from ag. For example, with the adjacency matrix in Example 1 and the authority vector 
calculated in Example 2 we have 

Referenced Site 

12 3 4 

"o 0 1 r 

10 0 0 
10 0 1 



^ao = 



1110 



"fs" 




"o' 




'o' 




'r 




'r 




'a' 


1 




1 


+ 1 


0 


+ 2 


0 


+ 2 


0 




3 


= 3 






1 






2 




1 


0 


0 




5 


2 




1 




1 




1 




0 




6 



Site 1 
Site 2 
Site 3 
Site 4 



Thus, we see that the links to each referenced site are weighted by the authority values in ag To control the size of the entries, the search engine normalizes A^q to produce the 
updated hub vector 



^ap ^ 1 
IMaoll ^ 



'a' 




"0.43133" 


Site 1 


3 




0.32350 


Site 2 


5 




0.53916 


Site 3 


6 




0.64700 


Site 4 



New Hub Weights 



The new hub vector hi can now be used to update the authority vector using Formula 4. The product ^^J^^ performs the weighting, and the normalization controls the size: 
Referencing Site 
12 3 4 



'o 1 1 r 


"0.43133' 




'o' 




'r 




'r 




'r 




'1.50966' 


0 0 0 1 


0.32350 


pa 0.43133 


0 


+ 0.32350 


0 


4-0.53916 


0 


1 0.64700 


1 




0.64700 


10 0 1 


0.53916 




1 


0 


0 




1 




1.07833 


10 10 


0.64700 




1 




0 




1 




0 




0.97049 



Site 1 
Site 2 
Site 3 
Site 4 



ai = =H— ^ ■ 

ll^^hlll 



1 



2.19142 



1.50966 
0.64700 
1,07833 
0.97049 



0.68889 
0.29524 
0.49207 
0.44286 



Site 1 
Site 2 
Site 3 
Site 4 



New Authority Weights 



Once the updated hub and authority vectors, hi and ai, are obtained, the search engine repeats the process and computes a succession of hub and authority vectors, thereby 
generating the interrelated sequences 

^ai 1 _ Aa2 

iiMir ^" iMa2ir ' "'^ u^k-iw (5) 



u _ ^ap 
^ ll^aoll 



h2 = 



' ^ ll^a;,_i|| 

i 



^^hi A\7 A^k 

ll^^hill ' \\A\2\\ ' ' ' ll^^h^ll ' 

However, each of these is a power sequence in disguise. For example, if we substitute the expression for h;^ into the expression for a;^, then we obtain 

||(^^^)a^_l|| 



(6) 



\\A 



which means that we can rewrite 6 as 



Similarly, we can rewrite 5 as 



(^^^)ap 

ao, ai = , 

ll(^^^)aGl| 



(^^^)a 
II (^^^)a 

~ ll(^^)hill ' 



ajc = - 



^^^)a;c-l 



l(^^^)a;c-l 



II A4^ h;c-lll 



(7) 



(8) 



Remark In Exercise 1 5 of Section 9.2 you were asked to show that A "^A ^^d AA ^ both have positive dominant eigenvalues. That being the case. Theorem 9.2. 1 ensures that 7 
and 8 converge to the dominant eigenvectors of A "^A AA '^■> respectively. The entries in those eigenvectors are the authority and hub weights that Google uses to rank the 
search sites in order of importance as hubs and authorities. 



EXAMPLES A Ranking Procedure < 



Suppose that a search engine produces 10 Internet sites in its search set and that the adjacency matrix for those sites is 



Referenced Site 
123456789 10 



0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


2 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


3 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


4 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


5 


0 


1 


1 


1 


1 


0 


0 


1 


0 


1 


6 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


7 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


8 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


9 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


10 



Use Formula 7 to rank the sites in decreasing order of authority. 



Referencing Site 



Solution We will take to be the normalized vector of column sums of A, and then we will compute the iterates in 7 until the authority vectors seem to 
stabilize. We leave it for you to show that 



0 




0 


2 




0.27217 


1 




0.13608 


1 




0.13608 


5 




0.68041 


3 




0.40825 


1 




0.13608 


3 




0.40825 


0 




0 


2 




0.27217 



and that 





0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 




0 


2 


1 


1 


2 


0 


0 


2 


0 


1 


0.27217 




3.26599 




0 


1 


1 


1 


1 


0 


0 


1 


0 


1 


0.13608 




1.90516 




0 


1 


1 


1 


1 


0 


0 


1 


0 


1 


0.13608 




1.90516 




0 


2 


1 


1 


5 


0 


0 


2 


0 


1 


0.68041 




5.30723 




0 


0 


0 


0 


0 


3 


1 


0 


0 


0 


0.40825 




1.36083 




0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


0.13608 




0.54433 




0 


2 


1 


1 


2 


0 


0 


3 


0 


1 


0.40825 




3.67423 
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0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 




0 


1 


1 


1 


1 


0 


0 


1 


0 


2 


0.27217 




2.17732 



Thus, 



II (^^^): 



L 

8.15362 



Continuing in this way yields the following authority iterates: 



ai = 



0 




0.27217 




0.13608 




0.13608 




0.68041 




0.40825 




0.13608 




0.40825 




0 




0.27217 









\\[a^a) 


aoll 


0 




0.40056 




0.23366 




0.23366 




0.65090 




0.16690 




0,06676 




0.45063 




0 




0.26704 







ai 


\\(a^a^^ 


aill 


0 




0.41652 




0.24917 




0.24917 




0.63407 




0.06322 




0.02603 




0.46672 




0 




0.27892 





33 = 



[A^A) 


a2 


II (^^^] 


a2ll 


0 




0.41918 




0.25233 




0.25233 




0.62836 




0.02372 




0.00981 




0.47050 




0 




0.28300 





0 




0 


3.26599 




0.40056 


1.90516 




0.23366 


1.90516 




0.23366 


5.30723 




0.65090 


1.36083 




0.16690 


0.54433 




0.06676 


3.67423 




0.45063 


0 




0 


2.17732 




0.26704 



a4 = 



[A^A) 


as 


II (^^^] 


asll 


0 




0.41973 




0.25309 




0.25309 




0.62665 




0.00889 




0.00368 




0.47137 




0 




0.28416 





a9 = 



[A^A) 


as 


\\{a^a^^ 


asll 


0 




0.41990 




0.25337 




0.25337 




0.62597 




0.00007 




0.00003 




0.47165 




0 




0.28460 





[A^A] 


ap 




\\(a^a^^ 


apll 




0 




Site 1 


0.41990 




Site 2 


0.25337 




Site 3 


0.25337 




Site 4 


0.62597 




Site 5 


0.00002 




Site 6 


0.00001 




Site 7 


0.47165 




Site 8 


0 




Site 9 


0.28460 




Site 10 



The small changes between ap and ajo suggest that the iterates have stabilized near a dominant eigenvector of From the entries in aio we conclude that Sites 
1, 6, 7, and 9 are probably irrelevant to the search and that the remaining sites should be searched in order of decreasing importance as 



Site 5, Site 8, Site 2, Site 10, Site 3 and 4 (a tie) 



Concept Review 

• Adjacency matrix 

• Hub vector 

• Authority vector 

• Hub weights 

• Authority weights 

Skills 

• Find the initial hub and authority vectors of an adjacency matrix. 

• Use the method of Example 3 to rank sites. 



Exercise Set 9.3 



In Exercises 1-2, find the initial hub and authority vectors for the given adjacency matrix^. 

1. Referenced Site 

12 3 

'n n 1 1 1 

Referencing Site 



1 


2 


3 




0 


0 


r 


1 


1 


0 


1 


2 


1 


0 


1 


3 





r 




'2 


ho = 


2 


, ao = 


0 




2 




3 



Referenced Site 
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2 


3 


4 




0 


1 


0 


r 


1 


1 


0 


0 


1 


2 


1 


0 


0 


1 


3 


1 


1 


1 


0 


4 



In Exercises 3-4, find the updated hub and authority vectors hi and for the adjacency matrix A. 
3. The matrix in Exercise 1. 
Answer: 





"0.39057" 




"0.60971" 




0.65094 


, ai ?c: 


0 




0.65094 




0.79262 



4. The matrix in Exercise 2. 

In Exercises 5-8, the adjacency matrix^ of an Internet search engine is given. Use the method of Example 3 to rank the sites in decreasing order of authority. 

5. Referenced Site 

12 3 4 



0 0 10 
10 0 0 
110 0 
0 10 0 



1 

2 Referencing Site 

3 

4 



Answer: 

Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 
Referenced Site 
12 3 4 

0 1 1 Ol 1 
A= 0 0 10 2 

10 0 1 3 

1 0 0 0 4 



Referencing Site 



7. Referenced Site 

1 2 3 4 5 

"O 1 1 1 o' 
A= 1 0 0 0 1 
0 0 0 0 1 
0 10 0 0 
0 110 0 

Answer: 

Site 2, site 3, site 4; sites 1 and 5 are irrelevant 

8. Referenced Site 
123456789 10 
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0 


0 


1 
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0 
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1 


0 
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0 
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1 
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0 


0 


0 


10 



Referencing Site 



1 

2 

^ Referencing Site 

4 
5 
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9.4 Comparison of Procedures for Solving Linear 
Systems 

There is an old saying that "time is money." This is especially true in industry where the cost of solving a linear 
system is generally determined by the time it takes for a computer to perform the required computations. This 
typically depends both on the speed of the computer processor and on the number of operations required by the 
algorithm. Thus, choosing the right algorithm has important financial implication in an industrial or research setting. 
In this section we will discuss some of the factors that affect the choice of algorithms for solving large-scale linear 
systems. 



Flops and the Cost of Solving a Linear System 

In computer jargon, an arithmetic operation (+,-,*, ; ) on two real numbers is called a flop, which is an acronym 

for "fioating-point operation."* The total number of flops required to solve a problem, which is called the cost of the 
solution, provides a convenient way of choosing between various algorithms for solving the problem. When needed, 
the cost in flops can be converted to units of time or money if the speed of the computer processor and the financial 
aspects of its operation are known. For example, many of today's personal computers are capable of performing in 
excess of 10 gigaflops per second (1 gigaflop = IQ^ flops). Thus, an algorithm that costs 1,000,000 flops would be 
executed in 0.0001 seconds. 

To illustrate how costs (in flops) can be computed, let us count the number of flops required to solve a linear system 
of n equations in n unknowns by Gauss- Jordan elimination. For this purpose we will need the following formulas for 
the sum of the first n positive integers and the sum of the squares of the first n positive integers: 

l + 2 + 3+... + ;. = ^i(^ (1) 
1^ + 2^ + 3^+ n(?2 + \)(2n + \) 



Let ilx = b be a linear system of n equations in n unknowns to be solved by Gauss- Jordan elimination (or, 
equivalently, by Gaussian elimination with back substitution). For simplicity, let us assume that ^ is invertible and 
that no row interchanges are required to reduce the augmented matrix [^|b] to row echelon form. The diagrams that 
accompany the following analysis provide a convenient way of counting the operations required to introduce a 
leading 1 in the first row and then zeros below it. In our operation counts, we will lump divisions and multiplications 
together as "multiplications," and we will lump additions and subtractions together as "additions." 



Step 1. It requires n flops (multiplications) to introduce the leading 1 in the flrst row. 



1 



X denotes aquantity that is being computed. 
• denotes a quantity that is not being computed. 
The augmented matrix size is « x (« -h 1) . 



step 2. It requires n multiplications and n additions to introduce a zero below the leading 1 , and there are « — 1 rows 
below the leading 1, so the number of flops required to introduce zeros below the leading 1 is 2«(« — 1). 



X 
X 

X 
X 



X 
X 

! 

X 
X 



X 
X 

! 

X 
X 



X 
X 

! 

X 

X 



X 
X 
: 

X 
X 



Column 1. Combining Steps 1 and 2, the number of flops required for column 1 is 

« + 2« ^« — 1 J = 2n^ — n 

Column 2. The procedure for column 2 is the same as for column 1, except that now we are 
dealing with one less row and one less column. Thus, the number of flops 
required to introduce the leading 1 in row 2 and the zeros below it can be obtained 
by replacing ^ by ^ — ] in the flop count for the first column. Thus, the number of 
flops required for column 2 is 

Column 3. By the argument for column 2, the number of flops required for column 3 is 

2(«-2)^- («-2) 

Total for all columns. The pattern should now be clear. The total number of flops required to create the n 
leading I's and the associated zeros is 

which we can rewrite as 

2[«2 + («-l)2 + ...+ l]-[«+(«-l) + ...+ l] 

or on applying Formulas 1 and 2 as 

2 6 2 -T +2" -6'' 



Next, let us count the number of operations required to complete the backward 
phase (the back substitution). 

Column n. It requires « — 1 multiplications and n—\ additions to introduce zeros above the 
leading 1 in the «th column, so the total number of flops required for the column 
is2(«-l). 



1 • 

0 1 
0 0 



1 



0 0 0 
0 0 0 



• 0 

• 0 

• 0 

1 0 

0 1 



Column (n - 



1). The procedure is the same as for Step 1 , except that now we are deahng with one 
less row. Thus, the number of flops required for the (« — l)st column is 2(« — 2) 



1 • 

0 1 
0 0 



0 0 0 
0 0 0 



0 0 

0 0 

0 0 

1 0 
0 1 



Column (« - 2). By the argument for column (« — 1 ) , the number of flops required for column 
{n — 2) is 2(« — 3). 

Total. The pattern should now be clear. The total number of flops to complete the 
backward phase is 

2(«-lJ + 2(«-2) + 2(«-3) + ... + 2(«-«) = 2[«^- (H- 2 + ... + «)] 

which we can rewrite using Formula 1 as 



,p_^(«_LJiLj 



= n —n 



In summary, we have shown that for Gauss- Jordan ehmination the number of flops required for the forward and 
backward phases is 

2 3 1 2 1 
flops for forward phase = ■:r« Tr« 

J 2 6 



(3) 



flops for backward phase = n — « 



Thus, the total cost of solving a linear system by Gauss-Jordan elimination is 



(4) 



2 3 3 2 7 
flops for both phases = + x« ^ 
5 Z t 



(5) 



Cost Estimates for Solving Large Linear Systems 

It is a property of polynomials that for large values of the independent variable the term of highest power makes the 
major contribution to the value of the polynomial. Thus, for large linear systems we can use 3 and 4 to approximate 
the number of flops in the forward and backward phases as 

2 3 

flops for forward phase ^ —n (6) 



flops for backward phase ^ n (7) 

This shows that it is more costly to execute the forward phase than the backward phase for large linear systems. 



Indeed, the cost difference between the forward and backward phases can be enormous, as the next example shows. 

EXAMPLE 1 Cost of Solving a Large Linear System M 



Approximate the time required to execute the forward and backward phases of Gauss- Jordan 
elimination for a system of 10,000 (= \()^) equations in 10,000 unknowns using a computer that can 
execute 10 gigaflops per second. 

Solution We have ^ == fo^* the given system, so from 6 and 7 the number of gigaflops required 
for the forward and backward phases is 



3 

-1 



gigaflops for forward phase « — «^ x 10~^ = -|-^10^j x 10~^ = x 10 

gigaflops for backward phase x 10"^ = (l^^J x 10~^ = 10 

Thus, at 10 gigaflops/s the execution times for the forward and backward phases are 
time for forward phase w ^-j x lO^j x 10~^ s 66.67 s 

time for backward phase ^ ^10~^ J x 10"^ s ^ 0.01 s 



We leave it as an exercise for you to confirm the results in Table 1 . 



Table 1 



Approximate Cost for an^ y n Matrix A with Large n 
Algorithm Cost in Flops 


Gauss-Jordan elimination (forward phase) 




Gauss-Jordan elimination (backward phase) 




Z [/-decomposition of A 




Forward substitution to solve iy = b 




Backward substitution to solve Ux = y 




by reducing [^|/] to I 




Compute ^~^b 


^ 2y? 



Considerations in Choosing an Algorithm for Solving a Linear System 

For a single linear system Ax = h equations in n unknowns, the methods of Z [/-decomposition and Gauss- 
Jordan elimination differ in bookkeeping but otherwise involve the same number of flops. Thus, neither method has 
a cost advantage over the other. However, Z [/-decomposition has other advantages that make it the method of 
choice: 



• Gauss- Jordan elimination and Gaussian elimination both use the augmented matrix [-4|b] , so Z? must be known. 
In contrast, Zf/-decomposition uses only the matrix A, so once that decomposition is known it can be used with as 
many right-hand sides as are required, one at a time. 

• The Zt/-decomposition that is computed to solve ^ = b can be used to compute A~^, if needed, with little 
additional work. 

• For large linear systems in which computer memory is at a premium, one can dispense with the storage of the I's 
and zeros that appear on or below the main diagonal of U, since those entries are known from the form of U. The 
space that this opens up can then be used to store the entries of Z, thereby reducing the amount of memory 
required to solve the system. 

• If ^ is a large matrix consisting mostly of zeros, and if the nonzero entries are concentrated in a "band" around the 
main diagonal, then there are techniques that can be used to reduce the cost of Zt/-decomposition, giving it an 
advantage over Gauss- Jordan elimination. 



The cost in flops for Gaussian elimination is the 
same as that for the forward phase of Gauss- 
Jordan elimination. 



Concept Review 

• Flop 

• Formula for the sum of the first n positive integers 

• Formula for the sum of the squares of the first n positive integers 

• Cost in flops for solving large linear systems by various methods 

• Cost in flops for inverting a matrix by row reduction 

• Issues to consider when choosing an algorithm to solve a large linear system 
Skills 

• Compute the cost of solving a linear system by Gauss-Jordan elimination. 

• Approximate the time required to execute the forward and backward phases of Gauss- Jordan elimination. 

• Approximate the time required to find an Zf/-decomposition of a matrix. 

• Approximate the time required to find the inverse of an invertible matrix. 



Exercise Set 9.4 

1. A certain computer can execute 10 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss-Jordan elimination. 

(a) A system of 1000 equations in 1000 unknowns. 

(b) A system of 10,000 equations in 10,000 unknowns. 

(c) A system of 100,000 equations in 100,000 unknowns. 



Answer: 



(a) Rs 0.067 second 

(b) 66.68 seconds 



(c) 66, 668 seconds, or about 18.5 hours 

2. A certain computer can execute 100 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss-Jordan elimination. 

(a) A system of 10,000 equations in 10,000 unknowns. 

(b) A system of 100,000 equations in 100,000 unknowns. 

(c) A system of 1,000,000 equations in 1,000,000 unknowns. 

3. Today's personal computers can execute 70 gigaflops per second. Use Table 1 to estimate the time required to 
perform the following operations on the invertible 10,000 x 10,000 matrix A. 

(a) Execute the forward phase of Gauss- Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimination. 

(c) LTJ-decomposition of A. 



Answer: 

(a) R« 9.52 seconds 

(b) » 0.0014 second 
(e) » 9.52 seconds 
(d) 28.6 seconds 

4. The IBM Roadrunner computer can operate at speeds in excess of 1 petaflop per second (1 petaflop =10 

flops). Use Table 1 to estimate the time required to perform the following operations of the invertible 
100, 000 X 100, 000 matrix A. 

(a) Execute the forward phase of Gauss- Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimination. 

(c) i ^/-decomposition of A. 



(a) Approximate the time required to execute the forward phase of Gauss- Jordan elimination for a system of 
100,000 equations in 100,000 unknowns using a computer that can execute 1 gigaflop per second. Do the 
same for the backward phase. (See Table 1.) 

(b) How many gigaflops per second must a computer be able to execute to find the £ ^/-decomposition of a 
matrix of size 10,000 x 10,000 in less than 0.5 s? (See Table 1.) 



(d) 



Find by reducing [A\l] to / 




(d) 



Find^-1 by reducing to / 




Answer: 



(^) 6.67 X 10 s for forward phase, 10 s for backward phase 
(b) 1334 



6. About how many teraflops per second must a computer be able to execute to find the inverse of a matrix of size 
100, 000 X 100, 000 in less than 0.5 s? (1 teraflop = 10^^ flops.) 

In Exercises 7-10, A and 5 are » x « matrices and c is a real number. 

7. How many flops are required to compute cA^ 
Answer: 

flops 

8. How many flops are required to compute A f 5? 

9. How many flops are required to compute ^? 

Answer: 
2«-^-ip«'^ flops 

10. If A is a diagonal matrix and ^ is a positive integer, how many flops are required to compute 
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9.5 Singular Value Decomposition 

In this section we will discuss an extension of the diagonalization theory for ^ x « symmetric matrices to general 
^■7 X n matrices. The results that we will develop in this section have applications to compression, storage, and 
transmission of digitized information and form the basis for many of the best computational algorithms that are 
currently available for solving linear systems. 

Decompositions of Square Matrices 

We saw in Formula 2 of Section 7.2 that every symmetric matrix A can be expressed as 

A^PDP'^ (1) 

where P is an ^ x .>2 orthogonal matrix of eigenvectors of A, and D is the diagonal matrix whose diagonal entries are 
the eigenvalues corresponding to the column vectors of F. In this section we will call 1 an eigenvalue 
decomposition of A (abbreviated EVD of A). 

If an Y}. X n matrix A is not symmetric, then it does not have an eigenvalue decomposition, but it does have a 
Hessenberg decomposition 

A = PHP'^ 

in which P is an orthogonal matrix and H is in upper Hessenberg form (Theorem 7.2.4). 

Moreover, if A has real eigenvalues, then it has a Schur decomposition 

A = PSP^ 

in which P is an orthogonal matrix and S is upper triangular (Theorem 7.2.3). 

The eigenvalue, Hessenberg, and Schur decompositions are important in numerical algorithms not only because the 
matrices Z), //, and S have simpler forms than A, but also because the orthogonal matrices that appear in these 
factorizations do not magnify roundoff error. To see why this is so, suppose that is a column vector whose entries 
are known exactly and that 

x = x + e 

is the vector that results when roundoff error is present in the entries of ^ 

If P is an orthogonal matrix, then the length-preserving property of orthogonal transformations implies that 

||i^-/^|| = ||x-^|| = ||e|| 

which tells us that the error in approximating f^hy Px has the same magnitude as the error in approximating -j^ by 

X- 

There are two main paths that one might follow in looking for other kinds of decompositions of a general square 
matrix A: One might look for decompositions of the form 

A = PJP~^ 

in which P is invertible but not necessarily orthogonal, or one might look for decompositions of the form 

A=ULV^ 



in which U and V are orthogonal but not necessarily the same. The first path leads to decompositions in which / is 
either diagonal or a certain kind of block diagonal matrix, called a Jordan canonical form in honor of the French 
mathematician Camille Jordan (see p. 510). Jordan canonical forms, which we will not consider in this text, are 
important theoretically and in certain applications, but they are of lesser importance numerically because of the 
roundoff difficulties that result from the lack of orthogonality in P. In this section we will focus on the second path. 



Singular Values 

Since matrix products of the form A^A will play an important role in our work, we will begin with two basic 
theorems about them. 

ID 



THEOREM 9.5.1 

If A is an ^ X « matrix, then: 

(a) A and ^-^^ have the same null space. 

(b) A and ^-^^ have the same row space. 

(c) J[ ^ and J[ have the same column space. 

(d) A and j[ have the same rank. 

D C 



We will prove part {a) and leave the remaining proofs for the exercises. 

Proof (a) We must show that every solution of ^ = Q is a solution of A^Ax. = 0^ conversely. If xq is any 
solution of ^ = 0, then xg is also a solution of = Q since 

^^^0 = ^^(^o) =^^0 = 0 

Conversely, if xq is any solution of a'^Ax. = 0? then xq is in the null space of and hence is orthogonal to all 
vectors in the row space of by part (q) of Theorem 4.8.10. 

However, a'^A is symmetric, so xq is also orthogonal to every vector in the column space of In particular, xq 
must be orthogonal to the vector (a'^A^xq; that is, 

XQ- (^^^)xo = 0 

Using the first formula in Table 1 of Section 3.2 and properties of the transpose operation we can rewrite this as 

4[A^Ayo = (Axo) ^[Axo) = [Axo) ■ [Axq) = \\Axof = 0 

which implies that Axq = 0, thereby proving that xq is a solution of ^4x0 = 0. 



THEOREM 9.5.2 



If A is an ^ X « matrix, then: 

(^) A^A is orthogonally diagonalizable. 

(b) The eigenvalues of are nonnegative. 



Proof (a) The matrix j{ being symmetric, is orthogonally diagonalizable by Theorem 7.2. 1 . 

Proof (b) Since A '^A is orthogonally diagonalizable, there is an orthonormal basis for pj^ consisting of 
eigenvectors of say (vi, V2, v„) . If we let \\, A2, A„ be the corresponding eigenvalues, then for 
1 < i < « we have 

II ^ = ^Vj - ^Vi = Vi - A^Avi [Fomiula (26) of Section 3.21 
= Vi - AiVi = Ai(vi • Vi) = Ail! Vill ^ = Ai 

It follows from this relationship that Aj > 0. 

r 

DEFINITION 1 

If A is an ^ X « matrix, and if Ai, A2, - A^ are the eigenvalues of then the numbers 
are called the singular values of A. 

L 

We will assume throughout this section that the 
eigenvalues of J[ '^J[ are named so that 

Al>A2>...>A„>0 

and hence that 



EXAMPLE 1 Singular Values < 

Find the singular values of the matrix 

"1 1 

0 1 

1 0 



Solution The first step is to find the eigenvalues of the matrix 

1 1 



A^A = 



The characteristic polynomial of ^ is 

so the eigenvalues of A '^A are = 3 and A2 = 1 and the singular values of A in order of decreasing 
size are 

ffi = /a7= /3, ^2 = /a^= 1 



Singular Value Decomposition 

Before turning to the main result in this section, we will find it useful to extend the notion of a "main diagonal" to 
matrices that are not square. We define the main diagonal of an ^ x « matrix to be the line of entries shown in 
Figure 9.5.1 — it starts at the upper left comer and extends diagonally as far as it can go. We will refer to the entries 
on the main diagonal as the diagonal entries. 

"Sc X X X X X X 

X X. X X X X X 
X xNc X XXX 
X X X XXX 

V X X X 

X X. X X 

X X >C X 

X X XX 

X X X X 

X X X X 

X X X X 

Main diagonal 



Figure 9.5.1 

We are now ready to consider the main result in this section, which is concerned with a specific way of factoring a 
general ^-^i x n matrix A. This factorization, called singular value decomposition (abbreviated SVD) will be given in 
two forms, a brief form that captures the main idea, and an expanded form that spells out the details. The proof is 
given at the end of this section. 



THEOREM 9.5.3 Singular Value Decomposition 

If A is an ^ X ;2 matrix, then A can be expressed in the form 

where U and V are orthogonal matrices and 2^ is an ^ x 12 matrix whose diagonal entries are the singular 
values of A and whose other entries are zero. 



Harry Bateman (1882-1946) 



Historical Note The term singular value is apparently due to the British-bom mathematician Harry 
Bateman, who used it in a research paper pubhshed in 1908. Bateman emigrated to the United States in 
1910, teaching at Bryn Mawr College, Johns Hopkins University, and finally at the California Institute of 
Technology. Interestingly, he was awarded his Ph.D. in 1913 by Johns Hopkins at which point in time he 
was already an eminent mathematician with 60 publications to his name. 
[Image'. Courtesy of the Archives, California Institute of Technology] 



THEOREM 9.5.4 Singular Value Decomposition (Expanded Form) 

If A is an ^ X « matrix of rank k, then A can be factored as 





0 


■ • 0 


0 




■ • 0 


0 


0 











T 
V2 



T 

T 



T 



in which and V have sizes w x mxn, and,>2x;2, respectively, and in which 

(a) V= [vi v? .-. Vvj] orthogonally diagonalizes 

(b) The nonzero diagonal entries of L are a\ = )fX\, ^2 — ^Jc = }[^' where A^, A2, A/^ are the 

nonzero eigenvalues of A corresponding to the column vectors of V. 

(c) The column vectors of V are ordered so that cri > 0*2 > ... > 0"jc > 0 . 



(e) (ui, U2, u^} is an orthonormal basis for col(A)}. 

(f) {ui, U2, Ujt, ^772} is extension of {uj, U2, Uj^) to an ortho-normal basis for R^. 



The vectors uj , U2, - - Uir are called the left 

singular vectors of A, and the vectors 

VI, V2, . . are called the right singular vectors 

ofA. 



EXAMPLE 2 Singular Value Decomposition if /A Is Not Square A 

Find a singular value decomposition of the matrix 



1 1 

0 1 

1 0 



Solution We showed in Example 1 that the eigenvalues of A '^A are = 3 and A2 = 1 and that the 
corresponding singular values of A are cr^ = ^ and = 1 . We leave it for you to verify that 











VI = 


2 

H 


and V2 = 


2 




2 




2 



are eigenvectors corresponding to A 1 and A2, respectively, and that = [vi|v2] orthogonally 
diagonalizes A^A- From part {d) of Theorem 9.5.4, the vectors 



ui = 



u\ 3 



1 1 

0 1 

1 0 



U2 = 



^^V2=(l) 



1 1 

0 1 

1 0 



2 

H 

2 



2 



3 
6 

R 

6 



0 

2 

2 



are two of the three column vectors of U. Note that uj and U2 are orthonormal, as expected. We could 
extend the set (ui , U2 } to an orthonormal basis for j^^. However, the computations will be easier if 
we first remove the messy radicals by multiplying uj and U2 by appropriate scalars. Thus, we will look 
for a unit vector U3 that is orthogonal to 



/6ui = 



2 

1 
1 



and /2u2 = 



0 
-1 

1 



To satisfy these two orthogonality conditions, the vector U3 must be a solution of the homogeneous 



linear system 



2 1 1 
0 -1 1 



^1 

^3 



We leave it for you to show that a general solution of this system is 







'-V 


^2 


= t 


1 


^3 




1 



Normalizing the vector on the right yields 



«3 = 



1 
1 

_L 



Thus, the singular value decomposition of A is 



1 1 

0 1 

1 0 





0 


1 












1 












1 


6 


2 


/5 



{3 0 

0 1 
0 0 



^2 j/2 

2 2 

/2 _ /2 



^ = U L V 

You may want to confirm the validity of this equation by multiplying out the matrices on the right side. 




Camille Jordan (1838-1922) 




Herman Klaus Weyl (1885-1955) 




Gene H. Golub (1932-) 



Historical Note The theory of singular value decompositions can be traced back to the work of five 
people: the Italian mathematician Eugenio Beltrami, the French mathematician Camille Jordan, the English 
mathematician James Sylvester (see p. 34), and the German mathematicians Erhard Schmidt (see p. 360) 
and the mathematician Herman Weyl. More recently, the pioneering efforts of the American mathematician 
Gene Golub produced a stable and efficient algorithm for computing it. Beltrami and Jordan were the 
progenitors of the decomposition — Beltrami gave a proof of the result for real, invertible matrices with 
distinct singular values in 1873. Subsequently, Jordan refined the theory and eliminated the unnecessary 
restrictions imposed by Beltrami. Sylvester, apparently unfamiliar with the work of Beltrami and Jordan, 
rediscovered the result in 1889 and suggested its importance. Schmidt was the first person to show that the 
singular value decomposition could be used to approximate a matrix by another matrix with lower rank, 
and, in so doing, he transformed it from a mathematical curiosity to an important practical tool. Weyl 
showed how to find the lower rank approximations in the presence of error. 

[Images: wikipedia (Beltrami); The Granger Collection, New York (Jordan); Courtesy Electronic Publishing 
Services, Inc., New York City (Weyl; wikipedia (Golub)] 



OPTIONAL 



We conclude this section with an optional proof of Theorem 9.5.4. 



Proof of Theorem 9.5.4 For notational simplicity we will prove this theorem in the case where A is an ^ x « 
matrix. To modify the argument for an x « matrix you need only make the notational adjustments required to 
account for the possibility that m>n^^ n>m' 

The matrix is symmetric, so it has an eigenvalue decomposition 
in which the column vectors of 

r=[vi|v2|...|v„] 

are unit eigenvectors of and D is a diagonal matrix whose successive diagonal entries X\, Xj, A„ are the 
eigenvalues of J[ corresponding in succession to the column vectors of y Since A is assumed to have rank k, it 
follows from Theorem 9.5.1 that A also has rank k. It follows as well that D has rank k, since it is similar to A 
and rank is a similarity invariant. Thus, D can be expressed in the form 

"Ai o" 



D = 



A2 



Aft 



(2) 



0 0 

where Ai > A2 > ... > A^ > 0. Now let us consider the set of image vectors 



(3) 



This is an orthogonal set, for if ? ^ j, then the orthogonality of and implies that 

- AVj = Vy • A\?j = Vj - \jVj = \j - v^- j = 0 

Moreover, the first k vectors in 3 are nonzero since we showed in the proof of Theorem 9.52b that y^Vj ||^ = Ay 
i = 1, 2, and we have assumed that the first k diagonal entries in 2 are positive. Thus, 

is an orthogonal set of nonzero vectors in the column space of A. But the column space of A has dimension k since 

and hence S, being a linearly independent set of k vectors, must be an orthogonal basis for col(^). If we now 
normalize the vectors in S, we will obtain an orthonormal basis (ui , U2, . . Uj^; } for col(^) in which 



l<i<yt 



or, equivalently, in which 



(4) 



It follows from Theorem 6.3.6 that we can extend this to an orthonormal basis 

{ui,U2,..., u^, ujt+i,-., u„} 
for R^. Now let U be the orthogonal matrix 

U= [ui U2 ... Mk ... u„] 

and let Z be the diagonal matrix 



^2 



It follows from 4, and the fact that Avj = 0 for j > ^, that 

UL = [o-iui a2U2 ... o-^u^ 0 ... 0] 

= [^vi Ay2 ... ^v^ Ayj^^i ... ^v„] 
= AV 

which we can rewrite using the orthogonality of V as ^ _ U^y^- 



Concept Review 

• Eigenvalue decomposition 

• Hessenberg decomposition 

• Schur decomposition 

• Magnification of roundoff error 

• Properties that A and j{ have in common 

• i4 is orthogonally diagonalizable 

• Eigenvalues of j{ are nonnegative 

• Singular values 

• Diagonal entries of a matrix that is not square 

• Singular value decomposition 

Skills 

• Find the singular values of an ^ x « matrix. 

• Find a singular value decomposition of an ^ x « matrix. 



Exercise Set 9.5 



In Exercises 1^, find the distinct singular values of A . 
\.A=[\ 2 0] 
Answer: 

0,/5 



'=[o :] 
'=[^ 1] 



Answer: 



(i 0 

1 f2 



In Exercises 5-12, find a singular value decomposition of A, 



'=[: -i] 



Answer: 



A = 



1 L 

J_ J_ 
f2 f2 

-3 0 
0 -4 



/2 0 
0 /2 



Answer: 



.4 = 



8. 



_2_ 

3 3 

-2 2 

-1 1 
2 -2 



_L 
2_ 



ill] 



J 2_ 

_2 L 



Answer: 



A = 



2 _L 
i 0 



1 _L 
■3 f2 



il 

6 

3 



3/2 0 

0 0 
0 0 



_ 1 J_ 

/2 {2 

1 1_ 

/2 v^2 



10. 



.4 = 



11. 



-2 -1 2 

2 1 -2 

1 0 

1 1 

■1 1 



Answer: 



A = 



1 


0 


2 






1 


1 


1 




0 




f2 




0 


/J 


1 


1 


1 


0 


0 


/5 


1'— 

V'2 


f6 







1 0 
0 1 



12. 



i4 = 



6 4 
0 0 
4 0 



13. Prove: If A is an ^ x « matrix, then ^ and ^ have the same rank. 

14. Prove part (d) of Theorem 9.5. 1 by using part (a) of the theorem and the fact that A and A '^A have n columns. 

(^) Prove part (b) of Theorem 9.5.1 by first showing that row(^'^^j is a subspace of row(^). 
(b) Prove part (c) of Theorem 9.5.1 by using part (b). 

16. Let T:R^ — ► be a linear transformation whose standard matrix A has the singular value decomposition 
A = UIV ^, and let 5 = { vi , V2, . . v„ } and 5' = |ui , U2, - - | be the column vectors of V and U, 
respectively. Show that E = [T] £f £ . 

17. Show that the singular values of a'^A are the squares of the singular values of A . 

18. Show that if ^ = UYV'^ ^ singular value decomposition of A, then U orthogonally diagonalizes AA^- 

True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer, 
(a) If A is an ^ X « matrix, then ^ is an ^ x w matrix 



Answer: 



False 

(b) If A is an X « matrix, then ^4^^! is a symmetric matrix. 
Answer: 

True 

(c) If A is an X « matrix, then the eigenvalues of A^A positive real numbers. 
Answer: 

False 

(d) If A is an « X « matrix, then A is orthogonally diagonalizable. 
Answer: 

False 

(e) If A is an X « matrix, then A^A is orthogonally diagonalizable. 
Answer: 

True 

(f) The eigenvalues of A^A the singular values of A. 
Answer: 

False 

(g) Every my^n matrix has a singular value decomposition. 
Answer: 

True 
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9.6 Data Compression Using Singular Value Decomposition 



Efficient transmission and storage of large quantities of digital data has become a major problem in our technological world. In this section 
we will discuss the role that singular value decomposition plays in compressing digital data so that it can be transmitted more rapidly and 
stored in less space. We assume here that you have read Section 9.5 . 



Reduced Singular Value Decomposition 

Algebraically, the zero rows and columns of the matrix in Theorem 9.5.4 are superfluous and can be eliminated by multiplying out the 
expression ULV ^ using block multiplication and the partitioning shown in that formula. The products that involve zero blocks as factors 
drop out, leaving 







^1 


0 • 


■ • 0 ■ 


T 
^1 


A=[ni U2 ■ 




0 


^2 • 


■ • 0 


T 
^2 






0 


0 • 


■ • ^k 


T 
""k 



(1) 



which is called a reduced singular value decomposition of A. In this text we will denote the matrices on the right side oflhy U\,T.\, and 
J/^J^ respectively, and we will write this equation as 

A^U^Y.^VJ (2) 

Note that the sizes oiU\,Y.\, and ^re ^ x A"? t x and x respectively, and that the matrix is invertible, since its diagonal 
entries are positive. 

If we multiply out on the right side of 1 using the column-row rule, then we obtain 

A = ainivj + ff2«2V2 + - + ^k^k^l (3) 



which is called a reduced singular value expansion oiA. This result applies to all matrices, whereas the spectral decomposition [Formula 
7 of Section 7.2] applies only to symmetric matrices. 



Remark It can be proved that an ^ x matrix M has rank 1 if and only if it can be factored as ^ = my , where u is a column vector in 
EJ^ and y is a column vector in R^. Thus, a reduced singular value decomposition expresses a matrix A of rank ^ as a linear combination 
of k rank 1 matrices. 



EXAMPLE 1 Reduced Singular Value Decomposition A 



Find a reduced singular value decomposition and a reduced singular value expansion of the matrix 

'1 r 



A = 



0 1 

1 0 



Solution In Example 2 of Section 9.5 we found the singular value decomposition 



1 1 

0 1 

1 0 



3 /I 

6 2 /J 

6 2 ^ 
U 



\f3 0 
0 1 

0 0 



_/2 _/2 

2 2 

/2 _ ^2 



(4) 



Since ^4 has rank 2 (verify), it follows from 1 with k = 2 that the reduced singular value decomposition of A corresponding 
to 4 is 



1 1 

0 1 

1 0 



6 2 



{I 0 
0 1 



{2 {2 

2 2 

{2 _ {2 

2 2 



This yields the reduced singular value expansion 



1 1 

0 1 

1 0 



= u\M.\\\ -I- 0-2U2V2 = {l> 



= f3 



i3 ii 

3 3 

6 6 



+ (1) 



3 

6 
6 



0 

i 1 
'2 2 



{2 }[2 

2 2 



(1) 



0 

2 

i2 



\[2 _ {2 

2 2 



6 6 

Note that the matrices in the expansion have rank 1 , as expected. 



Data Compression and Image Processing 

Singular value decompositions can be used to "compress" visual information for the purpose of reducing its required storage space and 
speeding up its electronic transmission. The first step in compressing a visual image is to represent it as a numerical matrix from which the 
visual image can be recovered when needed. 

For example, a black and white photograph might be scanned as a rectangular array of pixels (points) and then stored as a matrix A by 
assigning each pixel a numerical value in accordance with its gray level. If 256 different gray levels are used (0 = white to 255 = black), 
then the entries in the matrix would be integers between 0 and 255. The image can be recovered from the matrix A by printing or 
displaying the pixels with their assigned gray levels. 



Original 



Reconstruction 



Historical Note In 1924 the U.S. Federal Bureau of Investigation (FBI) began collecting fingerprints and handprints and now 
has more than 30 million such prints in its files. To reduce the storage cost, the FBI began working with the Los Alamos National 
Laboratory, the National Bureau of Standards, and other groups in 1993 to devise rank based compression methods for storing 
prints in digital form. The following figure shows an original fingerprint and a reconstruction from digital data that was 
compressed at a ratio of 26: 1 . 



If the matrix A has size m y.n^ then one might store each of its ^nn entries individually. An alternative procedure is to compute the reduced 
singular value decomposition 

i4 = ffiuivf + ff2U2V2 + • ■ • +^jcnfcv[ (5) 
in which (r\>a2^...>o^k^ and store the rj's, the u's, and the y's. 

When needed, the matrix A (and hence the image it represents) can be reconstructed from 5. Since each has m entries and each has n 
entries, this method requires storage space for 

km^kn + ^ = k{m + « 4 1 ) 

numbers. Suppose, however, that the singular values fT^_j_i, o"jc are sufficiently small that dropping the corresponding terms in 5 
produces an acceptable approximation 

Ay = o-juiVj' 0-2U2V2 + • • • =H aj.Uj.vJ (6) 

to A and the image that it represents. We call 6 the rank r approximation of A. This matrix requires storage space for only 

rm^rn -\- r = r(m \ n \ \) 

numbers, compared to numbers required for entry-by-entry storage of^. For example, the rank 100 approximation ofajOOOxlOOO 
matrix A requires storage for only 

100(1000 + 1000 + 1) = 200, 100 
numbers, compared to the 1,000,000 numbers required for entry-by-entry storage oiA — a compression of almost 80%. 

Figure 9.6.1 shows some approximations of a digitized mandrill image obtained using 6. 




Rank 4 Rank 10 Rank 20 Rank ^0 Rank 12S 

Figure 9.6.1 



Concept Review 

• Reduced singular value decomposition 

• Reduced singular value expansion 

• Rank of an approximation 

Skills 

• Find the reduced singular value decomposition ofm^xn matrix. 

• Find the reduced singular value expansion of an ^ x 



Exercise Set 9.6 



In Exercises 1^, fmd a reduced singular value decomposition of A. [Note: Each matrix appears in Exercise Set 9.5, where you were 
asked to fmd its (unreduced) singular value decomposition.] 



1. 



A = 



-2 2 
-1 1 
2 -2 



Answer: 



1 1 

\[2 {2 





' -2 




1 


2 




2 




1 


-2 


3. 


1 


0" 






A = 


1 


1 








-1 


1 







Answer: 



0 



1 

J_ J_ 

{l {2 

1 J_ 

V3 {2 

6 4 
0 0 

4 0 



{Z 0 ' 

0 {2 



1 0 
0 1 



In Exercises 5-8, fmd a reduced singular value expansion oiA. 
5. The matrix A in Exercise 1 . 



Answer: 



6. The matrix A in Exercise 2. 

7. The matrix A in Exercise 3. 

Answer: 



1 



f3 



0 

1 



f3 



1 



[1 0] + /2 



[0 1] 



1 



1 



f2 



f3 



8. The matrix A in Exercise 4. 



9. Suppose y4 is a 200 x 500 matrix. How many numbers must be stored in the rank 100 approximation of A7 Compare this with the 
number of entries of A. 

Answer: 

70,100 numbers must be stored; A has 100,000 entries 

True-False Exercises 

In parts (a) — (c) determine whether the statement is true or false, and justify your answer. Assume that UiLiVj^ is a reduced singular 
value decomposition of an ^ x n matrix of rank k. 

(a) Ui has size m xk- 
Answer: 

True 

(b) Li has size^x^- 
Answer: 

True 

(c) Vi has sizetx«- 
Answer: 

False 
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Chapter 9 Supplementary Exercises 

' Find an Z,i7-decomposition of ^4 = 
Answer: 



-6 2 

6 0 



2 0 
-2 1 



-3 1 
0 2 



2. Find the LZ)^7-decomposition of the matrix A in Exercise 1 . 



Find an Z {/-decomposition of ^ = 



Answer: 



2 4 6 
1 4 7 
1 3 7 



"2 


0 


0" 


'1 


2 


3" 


1 


2 


0 


0 


1 


2 


1 


1 


2 


0 


0 


1 



4. Find the ZZ)i7-decomposition of the matrix A in Exercise 3. 



Let A = 



2 1 
1 2 



and XQ = 



(a) Identify the dominant eigenvalue of A and then find the corresponding dominant unit eigenvector y 
with positive entries. 

(b) Apply the power method with Euclidean scaling to A and xg, stopping at x_^. Compare your value of 
x_5 to the eigenvector y found in part (a). 

(c) Apply the power method with maximum entry scaling to A and xq, stopping at x^j. Compare your 
result with the eigenvector 



Answer: 
(a) 



X=3, v = 



1 
1 



(b) 



(c) 



X5 



X5 



0.7100 
0.7041 

1 

0.9918 



, Vi 



0.7071 
0.7071 



6. Consider the symmetric matrix 



Discuss the behavior of the power sequence 

XQ, XI Xj^, ... 

with Euclidean scaling for a general nonzero vector xg . What is it about the matrix that causes the 
observed behavior? 

7. Suppose that a symmetric matrix A has distinct eigenvalues = 8, A2 = 1.4, A3 = 2.3, and A4 = — 8. 
What can you say about the convergence of the Rayleigh quotients? 

8 [11 

' Find a singular value decomposition of i4 = ^ ^ . 

1 r 

0 0 . 

1 1 



9. 

Find a singular value decomposition of i4 = 



Answer: 



f2 



f2 



0 

1 

0 



1 








"2 


0 


0 


0 


0 


1 


0 


0 









1 

f2 



1 

f2 

1 

f2 



10. Find a reduced singular value decomposition and a reduced singular value expansion of the matrix A in 
Exercise 9. 

11. Find the reduced singular value decomposition of the matrix whose singular value decomposition is 



1 


1 


1 


1 


2 


2 


2 


2 


1 


1 


1 


1 


2 


"2 


"2 


2 


1 


1 


1 




2 


2 


2 




1 


1 


_1 


1 


2 


2 




2 



24 
0 
0 
0 



0 0 
12 0 
0 0 
0 0 



2 
3 
2 
3 
i 
■3 



1 
■3 
2. 
3 
2 
3 



Answer: 



12 0 6 
4 -8 10 



[24 Ol 



i 
■3 



12. Do orthogonally similar matrices have the same singular values? Justify your answer. 

13. If P is the standard matrix for the orthogonal projection of onto a subspace W, what can you say about 
the singular values of PI 
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CHAPTER I 

1^ Applications of Linear 
U Algebra 



CHAPTER CONTENTS 

10.1. Constructing Curves and Surfaces Through Specified Points 
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10.4. Cubic Spline Interpolation 

10.5. Markov Chains 

10.6. Graph Theory 

10.7. Games of Strategy 

10.8. Leontief Economic Models 

10.9. Forest Management 

10.10. Computer Graphics 

10.11. Equilibrium Temperature Distributions 

10.12. Computed Tomography 

10.13. Fractals 

10.14. Chaos 

10.15. Cryptography 

10.16. Genetics 

10.17. Age-Specific Population Growth 

10.18. Harvesting of Animal Populations 

10.19. A Least Squares Model for Human Hearing 

10.20. Warps and Morphs 



INTRODUCTION 

This chapter consists of 20 applications of linear algebra. With one clearly marked 



exception, each application is in its own independent section, so sections can be deleted or 
permuted as desired. Each topic begins with a list of linear algebra prerequisites. 

Because our primary objective in this chapter is to present applications of linear algebra, 
proofs are often omitted. Whenever results from other fields are needed, they are stated 
precisely, with motivation where possible, but usually without proof 
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10.1 Constructing Curves and Surfaces Through 
Specified Points 

In this section we describe a technique that uses determinants to construct lines, circles, and general conic 
sections through specified points in the plane. The procedure is also used to pass planes and spheres in 3 -space 
through fixed points. 



Prerequisites 

Linear Systems 
Determinants 
Analytic Geometry 



The following theorem follows from Theorem 2.3.8. 



THEOREIVI 10.1.1 

A homogeneous linear system with as many equations as unknowns has a nontrivial solution if and only 
if the determinant of the coefficient matrix is zero. 



We will now show how this result can be used to determine equations of various curves and surfaces through 
specified points. 



A Line Through Two Points 

Suppose that (xi.yi) and (^X2, y2) distinct points in the plane. There exists a unique line 

cix + C2y + C2 = 0 (1) 

that passes through these two points (Figure 10.1.1). Note that c^, C2, and are not all zero and that these 
coefficients are unique only up to a multiplicative constant. Because (xi,yi) and (^^2, y2) 
substituting them in 1 gives the two equations 

cixi+C2y\+C2 = 0 (2) 



ciX2 + C2y2 + <^3 = ^ 



(3) 




Figure 10.1.1 

The three equations, 1, 2, and 3, can be grouped together and rewritten as 

xci+yc2 + C2 = 0 
xic\+y\C2 + C2 = 0 

which is a homogeneous Hnear system of three equations for c^, C2, and C3. Because ^{,^2, and ^3 are not all 
zero, this system has a nontrivial solution, so the determinant of the coefficient matrix of the system must be 
zero. That is. 



^ y 

^2 72 



= 0 



(4) 



Consequently, every point (x, y) on the line satisfies 4; conversely, it can be shown that every point (^x, y) that 
satisfies 4 lies on the line. 



EXAMPLE 1 Equation of a Line M 



Find the equation of the line that passes through the two points (2, 1) and (3, 7). 



Solution Substituting the coordinates of the two points into Equation 4 gives 

X y 1 



2 1 1 

3 7 1 



= 0 



The cofactor expansion of this determinant along the first row then gives 

-67: +7 +11 = 0 



A Circle Through Three Points 

Suppose that there are three distinct points in the plane, (;y 1 , 7 1 ) ? (^2, y2) ' ™^ (^3» 73) ' lying on a 

straight line. From analytic geometry we know that there is a unique circle, say. 



ci(^ +y )+C2X + C2y + C4 = 0 



(5) 



that passes through them (Figure 10.1.2). Substituting the coordinates of the three points into this equation gives 

^\(^\+yj) +^2^1 +^ayi +^^4 = 0 (6) 



c\(xj+ yj) +C2X2 + C3y2+C4 = 0 (7) 

^1(^3+73) +^2^3 +^31^3+^^4=0 (8) 

As before, Equations 5 through 8 form a homogeneous linear system with a nontrivial solution for cj, C2, ^73, 
and C4. Thus the determinant of the coefficient matrix is zero: 

A-^yl ^\ y\ ^ ^ 

22 =^ (9) 

^2+^2 ^2 72 1 

^3+^3 ^3 73 1 

This is a determinant form for the equation of the circle. 




Figure 10.1.2 



EXAMPLE 2 Equation of a Circle < 

Find the equation of the circle that passes through the three points (1,7), (6, 2) , and (4, 6) . 
Solution Substituting the coordinates of the three points into Equation 9 gives 





X 


y 


1 


50 


1 


7 


1 


40 


6 


2 


1 


52 


4 


6 


1 



which reduces to 

10(;r^ - 20x -40.y - 200 = 0 

In standard form this is 

(x-l)2 + 0_2)2 = 52 



Thus the circle has center (1,2) and radius 5. 



A General Conic Section Ttirougli Five Points 

In his momumental work Principia Mathematica, Issac Newton posed and solved the following problem (Book 
I, Proposition 22, Problem 14): "To describe a conic that shall pass through five given points." Newton solved 
this problem geometrically, as shown in Figure 10.1.3, in which he passed an ellipse through the points A, B, D, 
P, C; however, the methods of this section can also be applied. 



C 




Figure 10.1.3 



The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, or degenerate forms of 
these curves) is given by 

2 2 
c\x '^C2'xy + C'iy '=¥c4K + c$y + c^ = (i 

This equation contains six coefficients, but we can reduce the number to five if we divide through by any one of 
them that is not zero. Thus only five coefficients must be determined, so five distinct points in the plane are 
sufficient to determine the equation of the conic section (Figure 10.1.4). As before, the equation can be put in 
determinant form (see Exercise 7): 









X 


y 1 


A 




y\ 


^1 


y\ 1 






y\ 


^2 


yi 1 






y\ 


^3 


73 1 


4 


X4y4 


y\ 


X4 


74 1 


4 




y\ 


^5 


75 1 



(10) 



X 



EXAMPLE 3 



Figure 10.1.4 

Equation of an Orbit M 



An astronomer who wants to determine the orbit of an asteroid about the Sun sets up a Cartesian 
coordinate system in the plane of the orbit with the Sun at the origin. Astronomical units of 
measurement are used along the axes (1 astronomical unit = mean distance of Earth to Sun = 93 
million miles). By Kepler's first law, the orbit must be an ellipse, so the astronomer makes five 
observations of the asteroid at five different times and finds five points along the orbit to be 
(8.025,8.310). (10.170,6.355), (11.202,3.212), (10.736,0.375), (9.092,-2.267) 
Find the equation of the orbit. 

Solution Substituting the coordinates of the five given points into 10 and rounding to three 
decimal places give 

xy X 7 ^ 

64.401 66.688 69.056 8.025 8.310 1 
103.429 64.630 40.386 10.170 6.355 1 =o 
125.485 35,981 10.317 11.202 3.212 1 
115.262 4.026 0.141 10.736 0.375 1 
82.664 -20.612 5.139 9.092 -2.267 1 

The cofactor expansion of this determinant along the first row yields 

386.802x2 _ 1 02. 895xy + 446.029^2 - 2476.443;r - 1427. 998^ - 17109.375 = 0 
Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points. 



10; 

6 
4 

2 

0 
-2 
-4 















? 1 




.025,8.310) 
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-J 
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L 


\\ 




1 n 
















♦(11.202,3.212) 








n 








7(10.736,0.375) 




-c 












! 1 1 




















-2.267) 
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Figure 10.1.5 



A Plane Through Three Points 



In Exercise 8 we ask you to show the following: The plane in 3-space with equation 

CIX + C2y + C3Z + C4 = 0 

that passes through three noncollinear points (x\,yi,zi), (^2» >'2> ^2) ' ('3. >'3» ^3) given by the 
determinant equation 

X y z 1 

XI yi z\ 1 
^2 72 ^2 1 
3:3 yz Z2 1 



= 0 



(11) 



EXAMPLE 4 Equation of a Plane A 



The equation of the plane that passes through the three noncollinear points (1, 1,0), (2, 0, 
and (2, 9, 2) is 

X y z \ 

11 0 1 
2 0-11 
2 9 2 1 



-1), 



= 0 



which reduces to 



2x->' + 3z- 1=0 



A Sphere Through Four Points 



In Exercise 9 we ask you to show the following: The sphere in 3-space with equation 

2 2 2 

ci(x +y +z )+C2X + C2y+c^ + C5 = 0 

that passes through four noncoplanar points (x 1 , 7 1 , zi ) , (3:2. 72. ^2) ' (^3. 73. 23) ' ^"'^ (^4. 74. ^4) S'"^^" 
by the following determinant equation: 

1 



2 , 2,2 
X +y +z X y z 



X\+yi+Zi XI yi zi 1 

X2+y2+4 ^2 72 Z2 1 

^3+>'3+^3 ^3 73 Z3 1 

^4+74+24 X4 y4 Z4 1 



= 0 



(12) 



EXAMPLES Equation of a Sphere < 



The equation of the sphere that passes through the four points (0, 3, 2) , ( 1 , — 1 , 1 ) , (2, 1 , 0) , 
and (5, 1, 3) is 



2 , 2.2 


X 


y 


z 


1 


13 


0 


3 


2 


1 


3 


1 


-1 


1 


1 


5 


2 


1 


0 


1 


35 


5 


1 


3 


1 



This reduces to 

which in standard form is 

(;r-2)2 + 0-l)24.(z_3)2 = 9 



Exercise Set 10.1 

1. Find the equations of the hnes that pass through the following points: 

(a) (1. -1),(2.2) 

(b) (0. 1). (1. - 1) 

Answer: 

(a) y = 3x-A 

(b) y = - 2x + 1 

2. Find the equations of the circles that pass through the following points: 

(a) (2, 6), (2, 0), (5, 3) 

(b) (2. -2). (3, 5). (-4. 6) 

Answer: 

(a) ^;;2_4x_6;; + 4 = 0 or (x-2)2+ 0-3)2 = 9 

(b) ^2 j^y2 ^. _ 20 = 0 or (x + 1)2 + O - 2)2 = 25 

3. Find the equation of the conic section that passes through the points (0, 0), (0, — 1), (2, 0), (2, — 5), and 
(4.-1). 

Answer: 

X + 2xy +7 — 2;^ + 7 = 0 (a parabola) 

4. Find the equations of the planes in 3 -space that pass through the following points: 
(a) (1,1, -3), (1, -1, 1),(0. -1,2) 



(b) (2,3.1), (2,-1.-1), (1,2,1) 



Answer: 

(a) x + 2y+z = 0 

(b) ^x+y-2z+\=0 

^' (a) Alter Equation 1 1 so that it determines the plane that passes through the origin and is parallel to the plane 
that passes through three specified noncollinear points. 

(b) Find the two planes described in part (a) corresponding to the triplets of points in Exercises 4(a) and 4(b). 



Answer: 



(a) 



?: y z 0 

^1 71 zi 1 

X2 yi Z2 1 

X3 y2 Z2 1 



= 0 



(b) x + 2y+z = 0;-x+y-2z = 0 

6. Find the equations of the spheres in 3-space that pass through the following points: 

(a) (1,2. 3), (-1,2,1), (1.0.1). (1,2, -1) 

(b) (0, 1. - 2), (1, 3, 1), (2. - 1. 0). (3, 1. - 1) 

Answer: 



(a) x^+y^+z^-2x^4y-2z= - 2 or - 1)^ + O - 2)^ f (z- 1)^=4 

(b) +y^ '2y = 3 or {x ^ + (y ^ = 5 

7. Show that Equation 10 is the equation of the conic section that passes through five given distinct points in the 
plane. 

8. Show that Equation 11 is the equation of the plane in 3-space that passes through three given noncollinear 
points. 

9. Show that Equation 12 is the equation of the sphere in 3-space that passes through four given noncoplanar 
points. 

10. Find a determinant equation for the parabola of the form 

c\y } \ c^x \ C4 = 0 

that passes through three given noncollinear points in the plane. 

Answer: 







X 


1 


yi 


A 


^1 


1 


yi 






1 


73 


4 


^3 


1 



= 0 



11. What does Equation 9 become if the three distinct points are colHnear? 
Answer: 

The equation of the Hne through the three collinear points 

12. What does Equation 11 become if the three distinct points are collinear? 

Answer: 

0 = 0 

13. What does Equation 12 become if the four points are coplanar? 
Answer: 

The equation of the plane through the four coplanar points 

Section 10.1 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. The general equation of a quadric surface is given by 
2 9 2 

a\x +ra(2y +rat3Z ^a/^y ~\-a^xz+afyz + a'jx ~^a^ ^a^ + a\{^ = (} 

Given nine points on this surface, it may be possible to determine its equation. 

(a) Show that if the nine points (j^y, Vy) for j = 1, 2, 3, . . 9 lie on this surface, and if they determine uniquely 
the equation of this surface, then its equation can be written in determinant form as 











xz 


yz 




y 


Z 


1 




y\ 






xizi 


y\z\ 


x\ 


yi 


21 


1 




yl 


4 


X2y2 


x^2 


yi?2 


X2 


y2 


22 


1 




y\ 


4 


X3y2 


X3Z3 




X3 


y3 


23 


1 












y^A 


X4 


y4 


Z4 


1 


A 


y\ 




^iyb 




ys^5 


X5 


y5 


Z5 


1 




yl 


4 




Xfft 


ym 


X6 


ye 


26 


1 




y'l 




xjyj 


x^z^ 


y-jzi 


X7 


yj 


21 


1 


4 


yl 


4 




xgzg 


ym 


xs 


yi 


22 


1 


4 


yl 


4 


X9y9 


X9Z9 


y9Z9 


X9 


y9 


zg 


1 



= 0 



(b) Use the result in part (a) to determine the equation of the quadric surface that passes through the points 
(1, 2, 3), (2, 1, 7), (0, 4, 6), (3, - 1, 4), (3, 0, 11), ( - 1, 5, 8), (9, - 8, 3), (4, 5, 3), and 
(-2,6, 10). 

T2. 

(a) A hyperplane in the f2-dimensional Euclidean space /?" has an equation of the form 

aixi+a2X2 + ayc3+ ■ ■ • + + = 0 

where j = 1, 2, 3, « + 1, are constants, not all zero, and tTi, i = 1, 2, 3, • • • , », are variables for 
which 

(xi,X2, X3,...,x„)e/Z" 

A point 

(^10- ^20» ^30. — . ^mO)^'S" 

lies on this hyperplane if 

Given that the n points ?^2i^ ^3i^ ---^ ^m)^^ = 2, 3, . lie on this hyperplane and that they 
uniquely determine the equation of the hyperplane, show that the equation of the hyperplane can be written 
in determinant form as 

XI X2 X3 ' ' ' Xy^ \ 
^11 ^21 ^31 • 
^12 ^22 ^32 ' 
^13 ^23 ^33 ■ 

• • • 
I I I 

xi„ X2n X3„ • 



Xni 1 

■ ■ 

: : 

Xma 1 



= 0 



(b) Determine the equation of the hyperplane in that goes through the following nine points: 



(1,2. 3.4. 5. 6,7. 8. 9) 
(3.4. 5.6.7, 8, 9,1,2) 
(5, 6.7. 8, 9,1,2, 3,4) 
(7. 8. 9.1.2. 3,4. 5. 6) 
(9, 1,2. 3.4, 5. 6.7. 8) 



(2. 3.4. 5, 6.7. 8. 9.1) 
(4.5. 6,7, 8, 9,1,2, 3) 
(6.7,8, 9,1,2, 3,4. 5) 
(8. 9, 1.2, 3,4, 5, 6,7) 
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10.2 Geometric Linear Programming 



In this section we describe a geometric technique for maximizing or minimizing a linear expression in two 
variables subject to a set of linear constraints. 



Prerequisites 

Linear Systems 
Linear Inequalities 



Linear Programming 

The study of linear programming theory has expanded greatly since the pioneering work of George Dantzig 
the late 1940s. Today, linear programming is applied to a wide variety of problems in industry and science. ] 
this section we present a geometric approach to the solution of simple linear programming problems. Let us 
begin with some examples. 

EXAMPLE 1 Maximizing Sales Revenue A 

A candy manufacturer has 130 pounds of chocolate-covered cherries and 170 pounds of 
chocolate-covered mints in stock. He decides to sell them in the form of two different mixtures. 
One mixture will contain half cherries and half mints by weight and will sell for $2.00 per 
pound. The other mixture will contain one-third cherries and two-thirds mints by weight and 
will sell for $1.25 per pound. How many pounds of each mixture should the candy 
manufacturer prepare in order to maximize his sales revenue? 

Mathematical Formulation Let the mixture of half cherries and half mints be called mix A, 
and let x \ be the number of pounds of this mixture to be prepared. Let the mixture of one-third 
cherries and two-thirds mints be called mix B, and let A' 2 be the number of pounds of this 
mixture to be prepared. Since mix A sells for $2.00 per pound and mix B sells for $1.25 per 
pound, the total sales z (in dollars) will be 

? = 2.00x1 + 1.257:2 

Since each pound of mix A contains pound of cherries and each pound of mix B contains j 
pound of cherries, the total number of pounds of cherries used in both mixtures is 

Similarly, since each pound of mix A contains pound of mints and each pound of mix B 

2 

contains ^ pound of mints, the total number of pounds of mints used in both mixtures is 



Because the manufacturer can use at most 130 pounds of cherries and 170 pounds of mints, we 
must have 

Furthermore, since ^^i and ^2 cannot be negative numbers, we must have 

7ri>0 and 7:2 >0 

The problem can therefore be formulated mathematically as follows: Find values of :^ 1 and ^2 
that maximize 

? = 2.00x1 + 1.257:2 

subject to 

^7:1 + ^7:2 <130 
|7:i + |7:2 <170 

7:2 >o 

Later in this section we will show how to solve this type of mathematical problem 
geometrically. 



EXAMPLE 2 Maximizing Annual Yield < 

A woman has up to $10,000 to invest. Her broker suggests investing in two bonds, A and B. 
Bond ^ is a rather risky bond with an annual yield of 10%, and bond 5 is a rather safe bond 
with an annual yield of 7%. After some consideration, she decides to invest at most $6000 in 
bond A, to invest at least $2000 in bond B, and to invest at least as much in bond A as in bond 
B. How should she invest her money in order to maximize her annual yield? 

Mathematical Formulation Let x \ be the number of dollars to be invested in bond A, and 
let 7:2 be the number of dollars to be invested in bond B. Since each dollar invested in bond A 
earns $.10 per year and each dollar invested in bond B earns $.07 per year, the total dollar 
amount z earned each year by both bonds is 

^=.10x1 I .07x2 
The constraints imposed can be formulated mathematically as follows: 

Invest no more than $ 10,000: ^1+^2 < 10, 000 

Invest at most $ 6000 in bond -^4: 7:1 <6000 

Invest at least $ 2000 in bond B\ X2 > 2000 

Invest at least as much in bond ^ as in bond B: ttj > 7:2 

We also have the implicit assumption that 7:^ and 7:2 are nonnegative: 



xi>0 and X2>0 

Thus the complete mathematical formulation of the problem is as follows: Find values of 
and ^2 that maximize 

subject to 





< 10, 000 


XI 


<6000 


^2 


>2000 




>o 


^1 


>o 


X2 


>o 



EXAMPLE 3 Minimizing Cost M 

A student desires to design a breakfast of cornflakes and milk that is as economical as possible. 
On the basis of what he eats during his other meals, he decides that his breakfast should supply 
him with at least 9 grams of protein, at least y the recommended daily allowance (RDA) of 

vitamin D, and at least ^ the RDA of calcium. He finds the following nutrition and cost 
information on the milk and cornflakes containers: 





Milk 
(icup) 


( ornflakes 
(1 ounce) 


Coal 


7.5 cents 


5.0 cents 


FroU'in 


4 grams 


2 grains 


Vitariiin D 


K)l RDA 


of RDA 


Calcium 


golRDA 


None 



In order not to have his mixture too soggy or too dry, the student decides to limit himself to 
mixtures that contain 1 to 3 ounces of cornflakes per cup of milk, inclusive. What quantities of 
milk and cornflakes should he use to minimize the cost of his breakfast? 

Mathematical Formulation Let a" i be the quantity of milk used (measured in "^"^^P units), 

and let X2 be the quantity of cornflakes used (measured in 1 -ounce units). Then if z is the cost 
of the breakfast in cents, we may write the following. 



Cost of breakfast: z = 7.5xi + 5.0x2 

At least 9 grams protein: 4^:1 + 2x2 ^ 9 

At least \ KDA vitamin D: ^xi + yz-X2 > \ 

At least V BDA calcium: l > t 

4 6 4 

At least 1 ounce cornflakes 

LI W 11 ^>t:{oxxi-2x2<0) 

per cup [two — — cups lor milk: ~ 2 — / 

At most 3 ounces cornflakes 

LI W 11 ^ <-J(or 3:^1- 2x2 >0) 

per cup [two — — cups lor milk: ~ 2 — / 

As before, we also have the implicit assumption that ;jri > 0 and X2 > 0. Thus the complete 
mathematical formulation of the problem is as follows: Find values of -^i and X2 that minimize 

? = 7.5x1 + 5.07:2 

subject to 

4x1 + 2x2 >9 

x\ — 2x2 ^ ^ 

3x1-2x2 >0 

^\ >0 

^2 >0 



Geometric Solution of Linear Programming Problems 

Each of the preceding three examples is a special case of the following problem. 

r n 
Problem 

Find values of xi and X2 that either maximize or minimize 

? = <:ixi+C2X2 (1) 

subject to 



anxi + ai2X2 (<)(>)( = ) ^1 

a2ixi + a22^2 (<)(>)( = ) h ^2) 

^m\x\ + ^wj2^2 (<)(>)( = ) 

and 

:^1>0, 7:2 >0 (3) 



' J 
In each of the m conditions of 2, any one of the symbols < , > , and — may be used. 

The problem above is called the general linear programming problem in two variables. The linear function z 
in 1 is called the objective function. Equations 2 and 3 are called the constraints', in particular, the equations 
in 3 are called the nonnegativity constraints on the variables ?^ 1 and ?^2- 

We will now show how to solve a linear programming problem in two variables graphically. A pair of values 
(7:1, X2) that satisfy all of the constraints is called a feasible solution. The set of all feasible solutions 
determines a subset of the 7:17:2 -plane called the feasible region. Our desire is to find a feasible solution that 
maximizes the objective function. Such a solution is called an optimal solution. 

To examine the feasible region of a linear programming problem, let us note that each constraint of the form 

aiixi+ai2X2 = bi 
defines a line in the 7:i7:2-plane, whereas each constraint of the form 

anxi +ai2X2<i>i or anxi +ai2X2>bi 
defines a half-plane that includes its boundary line 

aiixi+ai2X2 = bi 

Thus the feasible region is always an intersection of finitely many lines and half-planes. For example, the four 
constraints 

^7:1 + 17:2 <130 

XI >0 
X2 >0 

of Example 1 define the half-planes illustrated in parts (a), (6), (c), and (d) of Figure 10.2.1. The feasible 
region of this problem is thus the intersection of these four half-planes, which is illustrated in Figure 10.2.1^. 



iAC,+ ixj<i30 



260 t, 



(a) 



it,+ |xj<170 




(b) 



(c) 



(d) 

Figure 10.2.1 



l(0, 255) 



.c, 



(0,0) 




(180, 120) 



(260, 0) 



(e) 



It can be shown that the feasible region of a linear programming problem has a boundary consisting of a finite 
number of straight line segments. If the feasible region can be enclosed in a sufficiently large circle, it is 
called bounded (Figure 10.2. le); otherwise, it is called unbounded (see Figure 10.2.5). If the feasible region 
is empty (contains no points), then the constraints are inconsistent and the linear programming problem has no 
solution (see Figure 10.2.6). 

Those boundary points of a feasible region that are intersections of two of the straight line boundary segments 
are called extreme points. (They are also called corner points and vertex points.) For example, in Figure 
10.2.1e, we see that the feasible region of Example 1 has four extreme points: 



(0,0). (0.255), (180,120), (260,0) 



(4) 



The importance of the extreme points of a feasible region is shown by the following theorem. 



THEOREM 10.2.1 Maximum and Minimum Values 

If the feasible region of a linear programming problem is nonempty and bounded, then the objective 
function attains both a maximum and a minimum value, and these occur at extreme points of the 
feasible region. If the feasible region is unbounded, then the objective function may or may not attain 
a maximum or minimum value; however, if it attains a maximum or minimum value, it does so at an 
extreme point. 



Figure 10.2.2 suggests the idea behind the proof of this theorem. Since the objective function 

z = cixi -\-C2P^2 

of a linear programming problem is a linear function of i and ^z, its level curves (the curves along which z 
has constant values) are straight lines. As we move in a direction perpendicular to these level curves, the 
objective function either increases or decreases monotonically. Within a bounded feasible region, the 
maximum and minimum values of z must therefore occur at extreme points, as Figure 10.2.2 indicates. 




curves 



Figure 10.2.2 

In the next few examples we use Theorem 10.2.1 to solve several linear programming problems and illustrate 
the variations in the nature of the solutions that may occur. 

EXAMPLE 4 Example 1 Revisited M 

Figure lO.lAe shows that the feasible region of Example 1 is bounded. Consequently, from 
Theorem 10.2.1 the objective function 

zr = 2.00x1 I 125:^:2 

attains both its minimum and maximum values at extreme points. The four extreme points and 
the corresponding values of z are given in the following table. 



Extreme Pt>inl 


Value of 




Z = 2.00*1 + '-^Surj 


(0, 0) 


0 


(0. 255) 


318.75 


(180, 120) 


510.00 


(260, 0) 


520.00 



We see that the largest value of z is 520.00 and the corresponding optimal solution is ( 260, 0) . 
Thus the candy manufacturer attains maximum sales of $520 when he produces 260 pounds of 
mixture A and none of mixture B. 



EXAMPLES Using Theorem 10.2.1 < 



Find values of i and ^2 that maximize 

z = xi + 3x2 

subject to 

2x1 + 3x2 < 24 

^1-^2 < 7 
:^2 < 6 
XI > 0 
:^2 > 0 

Solution In Figure 10.2.3 we have drawn the feasible region of this problem. Since it is 
bounded, the maximum value of z is attained at one of the five extreme points. The values of 
the objective function at the five extreme points are given in the following table. 




Figure 10.2.3 



F.xireine Point 


Value of 


{X1.X2) 




(0, 6) 


18 


(3. 6) 


21 


(9. 2) 


15 


(7.0) 


7 


(0,0) 


0 



From this table, the maximum value of z is 21, which is attained ^txi = 3 and X2 = ^' 

EXAMPLES Using Theorem 10.2.1 < 



Find values of i and 7^2 that maximize 

z = 4xi + 6x2 

subject to 



2jri + 3x2 


< 


24 




< 


7 




< 


6 




> 


0 


X2 


> 


0 



Solution The constraints in this problem are identical to the constraints in Example 5, so the 
feasible region of this problem is also given by Figure 10.2.3. The values of the objective 
function at the extreme points are given in the following table. 



Exlrcnic INiint 


Value (>r 


{Xi, X2) 


Z - Ax^ + 6x2 


(0.6) 


36 


(3,6) 


48 


(9,2) 


48 


(7,0) 


28 


(0.0) 


0 



We see that the objective function attains a maximum value of 48 at two adjacent extreme 
points, (3, 6) and (9, 2) . This shows that an optimal solution to a linear programming problem 
need not be unique. As we ask you to show in Exercise 10, if the objective function has the 
same value at two adjacent extreme points, it has the same value at all points on the straight line 
boundary segment connecting the two extreme points. Thus, in this example the maximum 
value of z is attained at all points on the straight line segment connecting the extreme points 
(3, 6) and (9, 2). 



EXAMPLE 7 The Feasible Region Is a Line Segment M 

Find values of -^1 and that minimize 

z = 2x1 "^2 

subject to 

2x1 + 3:^2 = 12 
2;\:i-3x2 > 0 

> 0 

> 0 

Solution In Figure 10.2.4 we have drawn the feasible region of this problem. Because one of 
the constraints is an equality constraint, the feasible region is a straight line segment with two 
extreme points. The values of z at the two extreme points are given in the following table. 



Figure 10.2,4 



Extreme Point 


Value of 


(Xy, X2) 




(3, 2) 


4 


16. Oi 





The minimum value of z is thus 4 and is attained at = 3 and X2 = 2. 



EXAMPLES Using Theorem 10.2.1 M 

Find values of i and that maximize 

z = 2xi + 5x2 

subject to 

2x1+^2 > 8 
-4x1 +X2 < 2 
2x1-37:2 < 0 
> 0 
^2 > 0 

Solution The feasible region of this linear programming problem is illustrated in Figure 
10.2.5. Since it is unbounded, we are not assured by Theorem 10.2.1 that the objective function 
attains a maximum value. In fact, it is easily seen that since the feasible region contains points 
for which both i and i^2 are arbitrarily large and positive, the objective function 

z = 2x\ +5x2 

can be made arbitrarily large and positive. This problem has no optimal solution. Instead, we 
say the problem has an unbounded solution. 




Figure 10.2.5 



EXAMPLE 9 Using Theorem 10.2.1 M 

Find values of ?^ { and ^2 that maximize 
subject to 

2x1 +X2 > 8 
-4x1 +X2 < 2 
2x1-37:2 < 0 

> 0 

> 0 

Solution The above constraints are the same as those in Example 8, so the feasible region of 
this problem is also given by Figure 10.2.5. In Exercise 11 we ask you to show that the 
objective function of this problem attains a maximum within the feasible region. By Theorem 
10.2.1, this maximum must be attained at an extreme point. The values of z at the two extreme 
points of the feasible region are given in the following table. 



Kxlreine Point 


N'aluc of 


(.r,..ri> 


' = -5.r, + jfi 


(1.6) 


1 


(3, 2) 





The maximum value of z is thus 1 and is attained at the extreme point = \,X2 = S. 

EXAMPLE 10 Inconsistent Constraints A 



Find values of i and 7^2 that minimize 

z = 3xi — 8^2 

subject to 

2:^1- X2 < A 
3xi + \\x2 < 33 
3^:1+ 4x2 > 24 
XI > 0 

X2 > 0 

Solution As can be seen from Figure 10.2.6, the intersection of the five half-planes defined 
by the five constraints is empty. This linear programming problem has no feasible solutions 
since the constraints are inconsistent. 

A^2 




Figure 10.2.6 There are no points common to all five shaded half-planes. 



Exercise Set 10.2 

1. Find values of x i and X2 that maximize 

z = 3xi + 2x2 

subject to 



2x1 + 3x2 


< 


6 


2x1 - X2 


> 


0 


XI 


< 


2 


X2 


< 


1 




> 


0 


X2 


> 


0 



Answer: 

2 22 
XI = 2, 7:2 = maximum value ofz= ^ 



2. Find values of :^ i and ^2 that minimize 

z = 3x1 " ^^2 

subject to 



Answer: 

No feasible solutions 
3. Find values of a"i and ^2 that minimize 

subject to 



2x1 -^2 


< 


-2 


4x1 -X2 


> 


0 


X2 


< 


3 


XI 


> 


0 


X2 


> 


U 


z= —3x1 + 




3x1 -X2 


> 


-5 


-XI+X2 


> 


1 


2;c 1+4x2 


> 


12 


^1 


> 


0 


X2 


> 


0 



Answer: 

Unbounded solution 

4. Solve the linear programming problem posed in Example 2. 

Answer: 

Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 

5. Solve the linear programming problem posed in Example 3. 

Answer: 

7 25 335 

cup of milk, — ounces of com flakes; minimum cost = "1 = 18.6& 

9 1 o 18 

6. In Example 5 the constraint x\ — 7:2 < 7 is said to be nonbinding because it can be removed from the 
problem without affecting the solution. Likewise, the constraint X2 < 6 is said to be binding because 
removing it will change the solution. 

(a) Which of the remaining constraints are nonbinding and which are binding? 

(b) For what values of the right-hand side of the nonbinding constraint x\ —X2'^l will this constraint 
become binding? For what values will the resulting feasible set be empty? 

(c) For what values of the right-hand side of the binding constraints 7:7 ■ 6 will this constraint become 
nonbinding? For what values will the resulting feasible set be empty? 



Answer: 



(a) xi>0 and ;ir2 > 0 are nonbinding; 2xi + 37:2 < 24 is binding 

(b) xi'~X2<v fovy ^ — 3 is binding and for y < — 5 yields the empty set. 

(c) X2 5- V for V <. 8 is nonbinding and for v < 0 yields the empty set. 

7. A trucking firm ships the containers of two companies, A and B. Each container from company A weighs 
40 pounds and is 2 cubic feet in volume. Each container from company B weighs 50 pounds and is 3 cubic 
feet in volume. The trucking firm charges company A $2.20 for each container shipped and charges 
company B $3.00 for each container shipped. If one of the firm's trucks cannot carry more than 37,000 
pounds and cannot hold more than 2000 cubic feet, how many containers from companies A and B should 
a truck carry to maximize the shipping charges? 

Answer: 

550 containers from company A and 300 containers from company B; maximum shipping 
charges = $2110 

8. Repeat Exercise 7 if the trucking firm raises its price for shipping a container from company A to $2.50. 
Answer: 

925 containers from company A and no containers from company B; maximum shipping 
charges = $2312.50 

9. A manufacturer produces sacks of chicken feed from two ingredients, A and B. Each sack is to contain at 
least 10 ounces of nutrient at least 8 ounces of nutrient N2-> and at least 12 ounces of nutrient 2V3. 
Each pound of ingredient^ contains 2 ounces of nutrient A^^, 2 ounces of nutrient N2-> and 6 ounces of 
nutrient N^- Each pound of ingredient B contains 5 ounces of nutrient A^^ , 3 ounces of nutrient N2, and 4 
ounces of nutrient N^- If ingredient A costs 8 cents per pound and ingredient B costs 9 cents per pound, 
how much of each ingredient should the manufacturer use in each sack of feed to minimize his costs? 

Answer: 

0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum cost = 24.8^s^ 

10. If the objective function of a linear programming problem has the same value at two adjacent extreme 
points, show that it has the same value at all points on the straight line segment connecting the two 
extreme points. [Hint: If (xj , and 7:2') are any two points in the plane, a point (xi, X2) lies on 
the straight line segment connecting them if 

xi=tx[ + (\-l)x[^ 

and 

X2 = ^ + (1-0^2' 

where ns a number in the interval [0, 1 ] .] 

11. Show that the objective function in Example 9 attains a maximum value in the feasible set. [Hint: 
Examine the level curves of the objective function.] 



Section 10.2 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 

Tl. Consider the feasible region consisting of 0 < t:, 0 <^ along with the set of inequalities 

- ( An j + ^ ( 4/ j ^ ) 

for it = 0, 1, 2, ...,n — 1 . Maximize the objective function 

assuming that (a) « = 1, (b) « = 2, (c) « = 3, (d) « = 4, (e) « = 5, (f) « = 6, (g) « = 7, (h) « = 8, (0 « = 9, 
(j) « = 10? ^iid (k) ^ = n . (1) Next, maximize this objective function using the nonlinear feasible region, 
0<;t, 0<7?and 

(m) Let the results of parts (a) through (k) begin a sequence of values for ?max- Do these values approach the 
value determined in part (1)? Explain. 

T2. Repeat Exercise Tl using the objective function z = x ^y. 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.3 The Earliest Applications of Linear Algebra 

Linear systems can be found in the earliest writings of many ancient civilizations. In this section we give 
some examples of the types of problems that they used to solve. 

n 

Prerequisites 

Linear Systems 

The practical problems of early civilizations included the measurement of land, the distribution of goods, the 
tracking of resources such as wheat and cattle, and taxation and inheritance calculations. In many cases, these 
problems led to linear systems of equations since linearity is one of the simplest relationships that can exist 
among variables. In this section we present examples from five diverse ancient cultures illustrating how they 
used and solved systems of linear equations. We restrict ourselves to examples before A.D. 500. These 
examples consequently predate the development of the field of algebra by Islamic/ Arab mathematicians, a 
field that ultimately led in the nineteenth century to the branch of mathematics now called linear algebra. 

EXAMPLE 1 Egypt (about 1650 B.C.) M 




Problem 40 of the Ahmes Papyrus 



The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient Egyptian 
mathematics. This 5-meter-long papyrus contains 84 short mathematical problems, together 
with their solutions, and dates from about 1650 B.C. Problem 40 in this papyrus is the following: 

■ 

Divide 100 hekats of barley among five men in arithmetic progression so that the sum of 
the two smallest is one-seventh the sum of the three largest. 

Let a be the least amount that any man obtains, and let d be the common difference of the terms 
in the arithmetic progression. Then the other four men receive a | d,a-^2d'>a^3d'> and 
a hekats. The two conditions of the problem require that 

a + (a + d) + (a-h2d) + {a + 3d) + (a-h4d) = 100 

^[(a + 2d) + (a + 3d) + (a + Ad)] = a + (a + d) 

These equations reduce to the following system of two equations in two unknowns: 



5a + \0d = 100 
Ua- 2d = 0 



(1) 



The solution technique described in the papyrus is known as the method of false position or 
false assumption. It begins by assuming some convenient value of a (in our case ^ = ]), 
substituting that value into the second equation, and obtaining d = W f 2- Substituting a = \ 
and d = \ \ / 2 into the left-hand side of the first equation gives 60, whereas the right-hand side 
is 100. Adjusting the initial guess for a by multiplying it by ]00 / ^30 leads to the correct value 
a = 5 f 3- Substituting a = 5 I 3 into the second equation then gives [i = 55 / 6? so the 
quantities of barley received by the five men are 10 / 6? 65 / 6? 120 / 6? 175 / 6? ^nd 230 / 6 
hekats. This technique of guessing a value of an unknown and later adjusting it has been used 
by many cultures throughout the ages. 



EXAMPLE 2 Babylonia (1900-1600 B.C.) < 




Babylonian clay tablet Ca MLA 1950 



The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.C. Many clay 
tablets containing mathematical tables and problems survive from that period, one of which 
(designated Ca MLA 1950) contains the next problem. The statement of the problem is a bit 
muddled because of the condition of the tablet, but the diagram and the solution on the tablet 
indicate that the problem is as follows: 



30 



20 



Area = 320 



A trapezoid with an area of 320 square units is cut off from a right triangle by a line 
parallel to one of its sides. The other side has length 50 units, and the height of the 
trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? 



Let X be the lower width of the trapezoid and j; its upper width. The area of the trapezoid is its 
height times its average width, so 20 j ~ 32(}. Using similar triangles, we also have 

— = The solution on the tablet uses these relations to generate the linear system 

hx+y) = 16 
^C:^->') = 4 

Adding and subtracting these two equations then gives the solution = 20 ^i^d y = 12- 



EXAMPLES China (A.D. 263) M 



Chiu Chang Suan Shu in Chinese characters 

The most important treatise in the history of Chinese mathematics is the Chiu Chang Suan Shu, 
or "The Nine Chapters of the Mathematical Art." This treatise, which is a collection of 246 
problems and their solutions, was assembled in its final form by Liu Hui in A.D. 263. Its 
contents, however, go back to at least the beginning of the Han dynasty in the second century 
B.C. The eighth of its nine chapters, entitled "The Way of Calculating by Arrays," contains 18 
word problems that lead to linear systems in three to six unknowns. The general solution 
procedure described is almost identical to the Gaussian elimination technique developed in 



Europe in the nineteenth century by Carl Friedrich Gauss. The first problem in the eighth 
chapter is the following: 

There are three classes of corn, of which three bundles of the first class, two of the 
second, and one of the third make 39 measures. Two of the first, three of the second, and 
one of the third make 34 measures. And one of the first, two of the second, and three of 
the third make 26 measures. How many measures of grain are contained in one bundle 
of each class? 



Let X, y, and z be the measures of the first, second, and third classes of com. Then the 
conditions of the problem lead to the following linear system of three equations in three 
unknowns: 

3x^2y+z = 39 

2x + 3y+ z = 34 (3) 
x-Vly^Zz = 26 

The solution described in the treatise represented the coefficients of each equation by an 
appropriate number of rods placed within squares on a counting table. Positive coefficients 
were represented by black rods, negative coefficients were represented by red rods, and the 
squares corresponding to zero coefficients were left empty. The counting table was laid out as 
follows so that the coefficients of each equation appear in columns with the first equation in the 
rightmost column: 



1 


2 


3 


2 


3 


2 


3 


1 


1 


26 


34 


39 



Next, the numbers of rods within the squares were adjusted to accomplish the following two 
steps: (1) two times the numbers of the third column were subtracted from three times the 
numbers in the second column and (2) the numbers in the third column were subtracted from 
three times the numbers in the first column. The result was the following array: 







3 


4 


5 


2 


8 


1 


1 


39 


24 


39 



In this array, four times the numbers in the second column were subtracted from five times the 
numbers in the first column, yielding 







3 




5 


2 


36 


1 


1 


99 


24 


39 



This last array is equivalent to the linear system 



3x + 2y+z = 39 
5y+z = 24 
36z = 99 

This triangular system was solved by a method equivalent to back substitution to obtain 
;t = 37/4,y= 17/4,and,.= Uf4- 

EXAMPLE 4 Greece (third century B.C.) A 




Archimedes c. 287-212 B.C. 



Perhaps the most famous system of linear equations from antiquity is the one associated with 
the first part of Archimedes' celebrated Cattle Problem. This problem supposedly was posed by 
Archimedes as a challenge to his colleague Eratosthenes. No solution has come down to us 
from ancient times, so that it is not known how, or even whether, either of these two geometers 
solved it. 

I 

If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who 
once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four 
herds of different colors, one milk white, another glossy black, a third yellow, and the 
last dappled. In each herd were bulls, mighty in number according to these proportions: 
Understand, stranger, that the white bulls were equal to a half and a third of the black 
together with the whole of the yellow, while the black were equal to the fourth part of 
the dappled and a fifth, together with, once more, the whole of the yellow. Observe 
further that the remaining bulls, the dappled, were equal to a sixth part of the white and 
a seventh, together with all of the yellow. These were the proportions of the cows: The 
white were precisely equal to the third part and a fourth of the whole herd of the black; 
while the black were equal to the fourth part once more of the dappled and with it a 



fifth part, when all, including the bulls, went to pasture together. Now the dappled in 
four parts were equal in number to a fifth part and a sixth of the yellow herd. Finally 
the yellow were in number equal to a sixth part and a seventh of the white herd. If thou 
canst accurately tell, O stranger, the number of cattle of the Sun, giving separately the 
number of well-fed bulls and again the number of females according to each color, thou 
wouldst not be called unskilled or ignorant of numbers, but not yet shalt thou be 
numbered among the wise. 



The conventional designation of the eight variables in this problem is 

W = number of white bulls 
B = number of black bulls 
Y = number of yellow bulls 

D = numb er of dapple d bulls 

w = number of white cows 
b — number of black cows 
y = number of yellow cows 

d = numb er of dapple d c o ws 

The problem can now be stated as the following seven homogeneous equations in eight 
unknowns: 

(The white bulls were equal to a half and a third of the 
black [bulls] together with the whole of the yellow 
[bulls].) 

(The black [bulls] were equal to the fourth part of the 
dappled [bulls] and a fifth, together with, once more, the 
whole of the yellow [bulls].) 

(The remaining bulls, the dappled, were equal to a sixth 
part of the white [bulls] and a seventh, together with all 
of the yellow [bulls].) 

(The white [cows] were precisely equal to the third part 
and a fourth of the whole herd of the black.) 

(The black [cows] were equal to the fourth part once 
more of the dappled and with it a fifth part, when all, 
including the bulls, went to pasture together.) 

(The dappled [cows] in four parts [that is, in totality] 
were equal in number to a fifth part and a sixth of the 
yellow herd.) 

(The yellow [cows] were in number equal to a sixth part 
and a seventh of the white herd.) 

As we ask you to show in the exercises, this system has infinitely many solutions of the form 



1- w=[\^\)b^y 

3. £)= l^ + ljj^+i^ 
7- y=(l + lj(fr+w) 



w 


— 


10, 366,482Ar 


B 


= 


7,460,514yt 


Y 


= 


4, 149, 32,1k 


D 


= 


1, 358, 060Ar 


w 


= 


7. 206, 360yt 


b 




4, 893, 246yt 


y 




5,439,213yt 


d 




3,515, 820yt 



(4) 



where k is any real number. The values /t = 1, 2, ... give infinitely many positive integer 
solutions to the problem, with = 1 giving the smallest solution. 



EXAMPLE 5 India (fourth century A.D.) A 




Fragment III-5-3v of the Bakhshali Manuscript 



The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from around 
the fourth century A.D., although some of its materials undoubtedly come from many centuries 
before. It consists of about 70 leaves or sheets of birch bark containing mathematical problems 
and their solutions. Many of its problems are so-called equalization problems that lead to 
systems of linear equations. One such problem on the fragment shown is the following: 

One merchant has seven asava horses, a second has nine haya horses, and a third has 
ten camels. They are equally well off in the value of their animals if each gives two 
animals, one to each of the others. Find the price of each animal and the total value of 
the animals possessed by each merchant. 



Let X be the price of an asava horse, let y be the price of a haya horse, let z be the price of a 
camel, and the let iCbe the total value of the animals possessed by each merchant. Then the 
conditions of the problem lead to the following system of equations: 

5x + ;/ + z = K 

x + 7y+z = K (5) 
x+^ + Sz = K 

The method of solution described in the manuscript begins by subtracting the quantity 



(^x +y +z) from both sides of the three equations to obtain 4x = 6y = 7z = K ^ {x + y + z) 

. This shows that if the prices x, and z are to be integers, then the quantity ^ (^x +y +z) 
must be an integer that is divisible by 4, 6, and 7. The manuscript takes the product of these 
three numbers, or 168, for the value of ^ _ | y -\-z), which yields x = 42? y = 28, and 
^ = 24 for the prices and = 262 for the total value. (See Exercise 6 for more solutions to this 
problem.) 



Exercise Set 10.3 

1. The following lines from Book 12 of Homer's Odyssey relate a precursor of Archimedes' Cattle Problem: 

Thou shalt ascend the isle triangular. 

Where many oxen of the Sun are fed. 
And fatted flocks. Of oxen flfty head 
In every herd feed, and their herds are seven; 

And of his fat flocks is their number even. 

The last line means that there are as many sheep in all the flocks as there are oxen in all the herds. What is 
the total number of oxen and sheep that belong to the god of the Sun? (This was a difficult problem in 
Homer's day.) 

Answer: 

700 

2. Solve the following problems from the Bakhshali Manuscript. 

(a) B possesses two times as much as A; C has three times as much as A and B together; D has four times 
as much as A, B, and C together. Their total possessions are 300. What is the possession of A? 

(b) B gives 2 times as much as A; C gives 3 times as much as B; D gives 4 times as much as C. Their total 
gift is 132. What is the gift of A? 

Answer: 

(a) 5 

(b) 4 

3. A problem on a Babylonian tablet requires finding the length and width of a rectangle given that the length 
and the width add up to 10, while the length and one-fourth of the width add up to 7. The solution 
provided on the tablet consists of the following four statements: 



Multiply 7 by 4 to obtain 28. 



Take away 10 from 28 to obtain 18. 

Take one-third of 18 to obtain 6, the length. 

Take away 6 from 10 to obtain 4, the width. 

Explain how these steps lead to the answer. 

4. The following two problems are from "The Nine Chapters of the Mathematical Art." Solve them using the 
array technique described in Example 3. 

(a) Five oxen and two sheep are worth 10 units and two oxen and five sheep are worth 8 units. What is the 
value of each ox and sheep? 

(b) There are three kinds of com. The grains contained in two, three, and four bundles, respectively, of 
these three classes of corn, are not sufficient to make a whole measure. However, if we added to them 
one bundle of the second, third, and first classes, respectively, then the grains would become on full 
measure in each case. How many measures of grain does each bundle of the different classes contain? 



Answer: 



(^) Ox, units; sheep, ^ unit 

(b) First kind, measure; second kind, measure; third kind, measure 

5. This problem in part (a) is known as the "Flower of Thymaridas," named after a Pythagorean of the fourth 
century B.C. 

(a) Given the n numbers ai, solve fovxi, X2, in the following linear system: 

XI+X2+ ■ ■ ■ = ai 

^1+^2 = ^2 

(b) Identify a problem in this exercise set that fits the pattern in part (a), and solve it using your general 
solution. 



Answer: 

(b) Exercise 7(b); gold, 30-^ minae; brass, 9-^ minae; tin, 14-^ minae; iron, 5-^ minae 

b b b ^ 

6. For Example 5 from the Bakhshali Manuscript: 

(a) Express Equations 5 as a homogeneous linear system of three equations in four unknowns (x, y, z, and 
K) and show that the solution set has one arbitrary parameter. 

(b) Find the smallest solution for which all four variables are positive integers. 



(c) Show that the solution given in Example 5 is included among your solutions. 



Answer: 

(a) 5x+y+z-K = 0 
x + 7y+z-K = 0 
x+y-^Sz-K = 0 

X = , y — , z = "|^» K = t where t is an arbitrary number 

(b) Take ^ = 131, so that = 21, y = 14,^=12, ^=131. 

(c) Take t = 262. so that ^=4Zy = 2S.z = 24, K = 262- 

Solve the problems posed in the following three epigrams, which appear in a collection entitled "The 
Greek Anthology," compiled in part by a scholar named Metrodorus around A.D. 500. Some of its 46 
mathematical problems are believed to date as far back as 600 B.C. [Note: Before solving parts (a) and (c), 
you will have to formulate the question.] 

(a) I desire my two sons to receive the thousand staters of which I am possessed, but let the fifth part of 
the legitimate one's share exceed by ten the fourth part of what falls to the illegitimate one. 

(b) Make me a crown weighing sixty minae, mixing gold and brass, and with them tin and much-wrought 
iron. Let the gold and brass together form two-thirds, the gold and tin together three-fourths, and the 
gold and iron three-fifths. Tell me how much gold you must put in, how much brass, how much tin, 
and how much iron, so as to make the whole crown weigh sixty minae. 

(c) First person: I have what the second has and the third of what the third has. Second person: I have 
what the third has and the third of what the first has. Third person: And I have ten minae and the third 
of what the second has. 



Answer: 



(^) Legitimate son, 577-^ staters; illegitimate son, 422^ staters 

(b) Gold, 30-^ minae; brass, 9^^ minae; tin, 14^^ minae; iron, 5-^ minae 

(c) First person, 45; second person, 31 h third person, 22— 

Section 10.3 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. 

(a) Solve Archimedes' Cattle Problem using a symbolic algebra program. 

(b) The Cattle Problem has a second part in which two additional conditions are imposed. The first of these 
states that "When the white bulls mingled their number with the black, they stood firm, equal in depth and 
breadth." This requires that W | 5 be a square number, that is, 1, 4, 9, 16, 25, and so on. Show that this 
requires that the values of k in Eq. 4 be restricted as follows: 

k = 4,456J49r^, r=l,2, 3,... 

and find the smallest total number of cattle that satisfies this second condition. 

Remark The second condition imposed in the second part of the Cattle Problem states that "When the 
yellow and the dappled bulls were gathered into one herd, they stood in such a manner that their number, 
beginning from one, grew slowly greater 'til it completed a triangular figure." This requires that the quantity 
7 I be a triangular number — that is, a number of the form 1,1 | 2,1 | 2 | 3,1 I 2-1-3 + 4,.... This 
final part of the problem was not completely solved until 1965 when all 206,545 digits of the smallest 
number of cattle that satisfies this condition were found using a computer. 

T2. The following problem is from "The Nine Chapters of the Mathematical Art" and determines a 
homogeneous linear system of five equations in six unknowns. Show that the system has infinitely many 
solutions, and find the one for which the depth of the well and the lengths of the five ropes are the smallest 
possible positive integers. 

Suppose that five families share a well. Suppose further that 

2 of A's ropes are short of the well's depth by one of B's ropes. 

3 of B's ropes are short of the well's depth by one of C's ropes. 

4 of C's ropes are short of the well's depth by one of D's ropes. 

5 of D's ropes are short of the well's depth by one of E's ropes. 

6 of E's ropes are short of the well's depth by one of A's ropes. 
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10.4 Cubic Spline Interpolation 



In this section an artist's drafting aid is used as a physical model for the mathematical problem of finding a curve that passes 
through specified points in the plane. The parameters of the curve are determined by solving a linear system of equations. 



Prerequisites 

Linear Systems 
Matrix Algebra 
Differential Calculus 



Curve Fitting 

Fitting a curve through specified points in the plane is a common problem encountered in analyzing experimental data, in 
ascertaining the relations among variables, and in design work. A ubiquitous application is in the design and description of 
computer and printer fonts, such as PostScript™ and TrueType™ fonts (Figure 10.4.1). In Figure 10.4.2 seven points in the 
xy-plane are displayed, and in Figure 10.4.4 a smooth curve has been drawn that passes through them. A curve that passes 
through a set of points in the plane is said to interpolate those points, and the curve is called an interpolating curve for those 
points. The interpolating curve in Figure 10.4.4 was drawn with the aid of a drafting spline (Figure 10.4.3). This drafting aid 
consists of a thin, flexible strip of wood or other material that is bent to pass through the points to be interpolated. Attached 
sliding weights hold the spline in position while the artist draws the interpolating curve. The drafting spline will serve as the 
physical model for a mathematical theory of interpolation that we will discuss in this section. 




Figure 10.4.1 
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Figure 10.4.2 




Figure 10.4.3 
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Figure 10.4.4 

Statement of the Problem 

Suppose that we are given n points in the xj-plane, 

which we wish to interpolate with a "well-behaved" curve (Figure 10.4.5). For convenience, we take the points to be equally 
spaced in the x-direction, although our results can easily be extended to the case of unequally spaced points. If we let the 
common distance between the x-coordinates of the points be h, then we have 

X2~xi=X2-X2= ' ' ' =Xy,~Xyi-i=h 
Let y = S(x), ^1 5 ^ ^ denote the interpolating curve that we seek. We assume that this curve describes the displacement of 
a drafting spline that interpolates the n points when the weights holding down the spline are situated precisely at the n points. It 
is known from linear beam theory that for small displacements, the fourth derivative of the displacement of a beam is zero along 
any interval of the x-axis that contains no external forces acting on the beam. If we treat our drafting spline as a thin beam and 
realize that the only external forces acting on it arise from the weights at the n specified points, then it follows that 

S(^'^\x) = 0 (1) 

for values of x lying in the ^ — \ open intervals 

(7:1,7:2), (7:2,7:3),..., (7:„_i,7:„) 

between the n points. 




Figure 10.4.5 

We also need the result from linear beam theory that states that for a beam acted upon only by external forces, the displacement 
must have two continuous derivatives. In the case of the interpolating curve y = S(x) constructed by the drafting spline, this 
means that S{x),S\x)^ and S'\x) must be continuous for 7:1 <x<Xyi- 



The condition that S''(x ") be continuous is what causes a drafting spline to produce a pleasing curve, as it results in continuous 



curvature. The eye can perceive sudden changes in curvature — that is, discontinuities in — ^but sudden changes in higher 
derivatives are not discernible. Thus, the condition that S^\x) be continuous is the minimal prerequisite for the interpolating 
curve to be perceptible as a single smooth curve, rather than as a series of separate curves pieced together. 

To determine the mathematical form of the function we observe that because =0^^ the intervals between the n 

specified points, it follows by integrating this equation four times that S(x") must be a cubic polynomial in x in each such 
interval. In general, however, S(x) will be a different cubic polynomial in each interval, so S(x) must have the form 

^Siix), xi<x<X2 



(2) 



where S\ (x), S2(x), . . Sy^-i (x) are cubic polynomials. For convenience, we will write these in the form 

3 2 

S\(x) = ai(x-xi) \bi{x-x\) \ci(x-xi) \ di, xi<x<X2 
S2(x) = a2(x -X2)^ -^b2(x -X2)^ \ C2(x -X2) \ d2. X2<x<X2 

Sy,-lix) = ay,-l(X - Xy,-l)^ bn-liX - Xy,-l)^ \ C y,-l (X - X y.-^) +ii„_l, < X < X 

The ci{s, ij's, ^2*8, and djS constitute a total of 4« — 4 coefficients that we must determine to specify S{x) completely. If we 
choose these coefficients so that S(x) interpolates the n specified points in the plane and S{x), S'(x), and S^\x) are 
continuous, then the resulting interpolating curve is called a cubic spline. 



(3) 



and 



(4) 



Derivation of the Formula of a Cubic Spline 

From Equations 2 and 3, we have 

S(x) = S\(x) =a\(x — xi) -\-b{(x~x\) ^ci(x-xi) \ di, xi<x<X2 
S(x) = S2(x)=a2(x~X2)^~\-b2ix-X2)'^^C2(x-X2) V d2. ^2<^<^2 

S(x) = =ia:„_i(7:-7:„_i)^-hZj„_i(;c-x„_i)^-|-c:„_l(7r-7:„_i) <x<Xyi 

so 

S'(x) = S[(x) = 3ai(x-xi)^ \ 2bi(x~xi)^cu xi<x<X2 

S\x) = S'2(x) = 3a2(x-X2)^ \^2b2(x-X2)^C2. X2<x<X3 

S'\x) = S['(x) = 6ai(x-xi) + 2bu xi<x<X2 

S'\x) = S^\x) = 6a2(x-X2)^2b2, X2<x<X3 

(6) 

S'\x) = 5';;'_i(7:) = 6^„_i(^-7:„_i) + 2&„_i, Xy,.i<x<Xy, 

We will now use these equations and the four properties of cubic splines stated below to express the unknown coefficients t^i^b^ 
, ^2 , dj,i= 1 , 2, . . « — 1 , in terms of the known coordinates y j ^ - - 7w 

1. S(x) interpolates the points (^j, jj), i= 1, 2, «. 



(5) 



Because S(x") interpolates the points (^j, y^), j = 1, 2, we have 



S(xi) =71, S(X2) =72, S(Xn) =yn (7) 
From the first ^ _ 1 of these equations and 4, we obtain 

<i\ = y\ 

dn-\ = yn-\ 

From the last equation in 7, the last equation in 4, and the fact that x^^ — Xy^-i = h,we obtain 

ayi-ih^ -h by,-ih^ -\- Cy,-ih -h dyi-i =7„ (9) 

2. 5^(7:) is continuous on [x\, Xy^]. 

Because Six) is continuous for <x < 7:^, it follows that at each point x^ in the set X2, ^3, -J^w-i we must have 

Si-iixi)=Si(xi), i = 2,3 n-\ (10) 

Otherwise, the graphs of Sj-{ (x) and S'j (7:) would not join together to form a continuous curve at Xj. When we apply the 
interpolating property S^ixj) =yi->^^ follows from 10 that S^-iixi) = 72? ^ = 2, 3, « — 1, or from 4 that 

3 2 
ia(2A +62A +<:2A+^3f2 = 73 



(11) 



t3t„_2A^ + 6„_2A^ + «:„-2A + 'i'M-2 = 



3. S\x) is continuous on [x\, x^^]. 

Because S'(x) is continuous foYXi<x < x^, it follows that 

Sj_l(x,)=^(x,), 1 = 2,3 «-l 



or, from 5, 



3a2h + 2^2^ + ^2 = <^3 



3t3f„«2A^ + 26„_2A + c„_2 = 



(12) 



4. S"(x) is continuous on [^1, ^2] • 

Because S''\x) is continuous foYXi<x< it follows that 

S;'_iixi)=S;'ixi), 1 = 2,3 n-\ 

or, from 6, 

6a\h'^2b\ = 2&2 

6^3(2^ + 2^2 = 2&3 

6^2„_2A -f 26„_2 = 26„_i 



(13) 



Equations 8, 9, 11, 12, and 13 constitute a system of 4« — 6 linear equations in the 4« — 4 unknown coefficients ^i, b^, d^, 
j = 1, 2, — 1 . Consequently, we need two more equations to determine these coefficients uniquely. Before obtaining these 
additional equations, however, we can simplify our existing system by expressing the unknowns ^i.b^.^i, and in terms of 



new unknown quantities 
and the known quantities 
For example, from 6 it follows that 



Ml = 2b I 
M2 = 2b 2 



so 



Moreover, we already know from 8 that 

^1=71, d2=y2.--: ^«-l=7«-l 
We leave it as an exercise for you to derive the expressions for the ci{s and c{s in terms of the MjS and yj's. The final result is 
as follows: 



THEOREM 10.4.1 Cubic Spline interpolation 

Given n points (x\,yi), (X2,y2)^ with.Tj_|.i — t:, = A, i = 1, 2, « — 1, the cubic spline 



a\(x — x\) \ bi(x — xi) I ci(x — x\) A' d\. 



x\<x<X2 
X2<x<X2 



a^-i (x - (x - + c„_i - ~{- ^af„_i, :)r„_i < < 



that interpolates these points has coefficients given by 

ai = (Mi^i - Mi) / 6h 
b 
c 
d 



= Mil2 

= (ym -yi)fh-[ (M,+i + 2Mi)k / 6] 
=yi 



(14) 



forj = 1, 2, 1, where Mj = ? = 1, 2, 



From this result, we see that the quantities M\ , Mi, • - My^ uniquely determine the cubic spline. To find these quantities, we 
substitute the expressions for ^^i.b^, and Ci given in 14 into 12. After some algebraic simplification, we obtain 



Mi+4M2-hM3 = e(y\-2y2 \ yi)lh' 
M2 + 4M3 4-M4 = e(y2-'^y2^yA)lh' 



M„_2 + 4M„_i+M„ = 6iyy,.2-^yyi-\^yn)lh' 



(15) 



or, in matrix form, 



















Ml 
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1 


A 
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1 
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n 
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n 
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M2 


0 


1 
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1 
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n 
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•J 
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0 
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1 
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4 . 


. 0 
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0 
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0 
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0 
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M4 
: 


0 


0 


0 


0 . 


. 4 


1 


0 


0 


M„_3 


0 


0 


0 


0 . 


. 1 


4 


1 


0 


M„_2 


0 


0 


0 


0 . 


. 0 


1 


4 


1 


M„_i 


















M„ 



71- 2^2+73 

72- 273+74 

73- 2744-73 

7„_4-27„_3+7„_2 

7«-3-27„_2 4-7«-l 
7«-2-27„_i -^7„ 



This is a linear system of ^ _ 2 equations for the n unknowns M\ , M2, . - M„. Thus, we still need two additional equations to 
determine M\ , M2, . . M„ uniquely. The reason for this is that there are infinitely many cubic splines that interpolate the 
given points, so we simply do not have enough conditions to determine a unique cubic spline passing through the points. We 
discuss below three possible ways of specifying the two additional conditions required to obtain a unique cubic spline through 
the points. (The exercises present two more.) They are summarized in Table 1. 

Table 1 



Natural 
Spline 


The second 
derivative of the 
spline is zero at the 
endpoints. 


M, =0 
M, = 0 




4 1 0 0 0 0 
1 4 1 ••• 0 0 0 

0 0 0 1 4 1 

0 0 0 - 014 


" M2 " 
Af, 

_ 


6 


Vj -2y2-f 


Parabolic 

Runout 

Spline 


I he sphne reduces 
to a parabolic curve 
on the first and laiit 
intervals. 






"5 1 0 0 0 0 

1 4 I 0 0 0 

0 0 0 1 4 1 

0 0 0 0 I 5 


M, 
Mn-2 


6 


yi - 2V3 + y4 

Jn 2-2y,, i+V;, 


C ubic 

Runout 

Spline 


I he spline is a 
single cubic cur\ e 
on the first two and 
last two inter\als. 


.V/, =2M.-Af; 




6 0 0 - 0 0 0 
14 1-000 

0 0 0 1 4 1 
0 0 0 ••• 0 0 6 




6 


V, -2y2 + v, 
yi - 2 V3 + y4 



The Natural Spline 

The two simplest mathematical conditions we can impose are 

Ml = M„ = 0 

These conditions together with 15 result in an « x « linear system for Mi, M2, My,, which can be written in matrix form as 
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0 


0 


0 . 


. 0 
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0' 


' Ml 
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0 . 
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-2^3+74 
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1 


M„ 






0 



For numerical calculations it is more convenient to eliminate M\ and M„ from this system and write 
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. 0 
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71- 272+73 

72- 2^3+74 

7«-3-27„_2H-7«-l 
7«-2-27«-l +7« 



(16) 



together with 



Mi = 0 



(17) 



M„ = 0 (18) 

Thus, the — 2) x (« — 2) linear system can be solved for the ^ _ 2 coefficients M2, M3, and and are 
determined by 17 and 18. 

Physically, the natural spline results when the ends of a drafting spline extend freely beyond the interpolating points without 
constraint. The end portions of the spline outside the interpolating points will fall on straight line paths, causing S'\x) to 
vanish at the endpoints ^1 and and resulting in the mathematical conditions Mi = M„ = 0. 

The natural spline tends to flatten the interpolating curve at the endpoints, which may be undesirable. Of course, if it is required 
that S^\x) vanish at the endpoints, then the natural spline must be used. 



The Parabolic Runout Spline 

The two additional constraints imposed for this type of spline are 

Ml = M2 (19) 



My, = (20) 
If we use the preceding two equations to eliminate M i and from 15, we obtain the — 2) x (« — 2) linear system 

"5100 
14 10 
0 14 1 

0 0 0 0 

0 0 0 0 

for M2, M3, Myi-\ . Once these « — 2 values have been determined. Mi and My^ are determined from 19 and 20. 

From 14 we see that Mj = M2 implies that = 0, and M„ = ^n-\ ™pliGS that = 0. Thus, from 3 there are no cubic 
terms in the formula for the spline over the end intervals :^2] [^n-l^ ^n]- Hence, as the name suggests, the parabolic 
runout spline reduces to a parabolic curve over these end intervals. 



0 


0 


0' 


M2 




71 -2>'2+73 


0 


0 


0 


M3 




72-273+74 


0 


0 


0 


M4 


6 


73-274+75 














1 


4 


1 






7«-3-27m-2+7«-1 


0 


1 


5 






7«-2-27«-l +7« 



(21) 



The Cubic Runout Spline 



For this type of spline, we impose the two additional conditions 

Ml = 2M2 - M3 (22) 



M„ = 2M„_i-M„_2 (23) 

Using these two equations to eliminate Mi and M„ from 15 results in the following (« — 2) x (« — 2) linear system for 
M2, M3,..., M„_i: 



6 


0 


0 


0 . 


. 0 


0 


0" 


" M2 ' 




71-272+73 


1 


4 


1 


0 . 


. 0 


0 


0 


M3 




72-273+74 


0 


1 


4 


1 . 


. 0 


0 


0 


M4 


6 


73-274+75 






















0 


0 


0 


0 . 


. 1 


4 


1 


-W«-2 




7n-3- 27)1-2 +7m-1 


0 


0 


0 


0 . 


. 0 


0 


6 






7n-2-27M-l +7« 



(24) 



After we solve this linear system for M2, M3, M„_i, we can use 22 and 23 to determine Mi and 
If we rewrite 22 as 

M2 — Ml = M3 — M2 

it follows from 14 that c^i = a2. Because S'"(x) = 6ai on [7:1, 7:2] and S'"{x) = 6^a^2 [^2» ;t3] , we see that S^"(x) is 
constant over the entire interval 1, ^^3] . Consequently, S(x) consists of a single cubic curve over the interval [7:1, x 3] rather 
than two different cubic curves pieced together at :^2- [To see this, integrate S'" (x) three times.] A similar analysis shows that 
S(x) consists of a single cubic curve over the last two intervals. 

Whereas the natural spline tends to produce an interpolating curve that is flat at the endpoints, the cubic runout spline has the 
opposite tendency: it produces a curve with pronounced curvature at the endpoints. If neither behavior is desired, the parabolic 
runout spline is a reasonable compromise. 

EXAMPLE 1 Using a Parabolic Runout Spline M 

The density of water is well known to reach a maximum at a temperature slightly above freezing. Table 2, from 
the Handbook of Chemistry and Physics (CRC Press, 2009), gives the density of water in grams per cubic 
centimeter for five equally spaced temperatures from — lo'^CtoBO^C- We will interpolate these five 
temperature-density measurements with a parabolic runout spline and attempt to find the maximum density of 
water in this range by finding the maximum value on this cubic spline. In the exercises we ask you to perform 
similar calculations using a natural spline and a cubic runout spline to interpolate the data points. 

Table 2 



Teitifierature (X) 


Density (g/cm-^) 


-10 


.99815 


0 


.99987 


10 


.99973 


20 


.99823 


50 


.99567 



Set 





-10, 


VI 


= .99815 


To — 


0^ 


^2 


= .99987 


^3 = 


10, 


73 


= .99973 


X4 = 


20, 




= .99823 


X5 = 


30, 


y5 


= .99567 



Then 

S[yi-2y2+y3]fh^ =-.0001116 
6[>'2-^3+>'4] /A^ = - .0000816 
^[y3-2y4+y5] Ik^ = - .0000636 

and the linear system 21 for the parabolic runout spline becomes 



'5 


1 


0" 


'M2 




■-.0001116' 


1 


4 


1 


Ms 




-.0000816 


0 


1 


5 


J/4 




-.0000636 



Solving this system yields 

M2 = - .00001973 
Af3= - .00001293 
^4= - .00001013 

From 19 and 20, we have 

Mi = M2= - .00001973 
M5 = JI/4= - .00001013 

Solving for the tati's, 6j-'s, ^I's, and rfj-'s in 14, we obtain the following expression for the interpolating parabolic 
runout spline: 

^ -.00000987 -f 10)^ -f . 0002707 -h 10) 4 .99815, -10 < ;t < 0 

.000000113(^-0)^ -.00000987(;r-0)^ I .0000733(;^ - 0) +.99987, 0<;t<10 
. 000000047 (;t - 10)^ -.00000647 - 10)^ - .0000900(;^ - 10) -I- .99973, 10 < ;c < 20 
-.00000507 - 20)^ - .0002053(;f - 20) + .99823, 20 < :f < 30 

\ 

This spline is plotted in Figure 10.4.6. From that figure we see that the maximum is attained in the interval 
[0, 10] . To find this maximum, we set o (a) equal to zero in the interval [0, 10] : 

S''(;c) = .000000339:y^-.0000197x } .0000733 = 0 

To three significant digits the root of this quadratic in the interval [0, 10] is x = 3.99? and for this value of x, 

»?(3.99) = 1.00001. Thus, according to our interpolated estimate, the maximum density of water is 

1.00001 g / cm attained at 3.99°C- This agrees well with the experimental maximum density of 

1.00000 g / cm attained at 3.98°C- (Iii the original metric system, the gram was defined as the mass of one cubic 

centimeter of water at its maximum density.) 




0.99500 

"10 0 10 20 30 
Temperature {'^C) 

Figure 10.4.6 



Closing Remarks 

In addition to producing excellent interpolating curves, cubic splines and their generalizations are useful for numerical 
integration and differentiation, for the numerical solution of differential and integral equations, and in optimization theory. 

Exercise Set 10.4 

1. Derive the expressions for and in Equations 14 of Theorem 10.4.1. 

2. The six points 

(0, .00000), (.2, .19867), (.4, .38942), 
(.6, .56464), (.8, .71736), (1.0, .84147) 
lie on the graph of y = sm x-> where x is in radians. 

(a) Find the portion of the parabolic runout spline that interpolates these six points for .4 < < .6. Maintain an accuracy of 
five decimal places in your calculations. 

(b) Calculate ^(.5) for the spline you found in part (a). What is the percentage error of S(. 5) with respect to the "exact" 
value ofsin(.5) = .47943? 

Answer: 

(a) S(x) = - . 12643(;^ - .4)^ - .2021 1 (x - .4)^ + .92158(x - .4) + .38942 
^(.5) = .47943; eiTor = 0% 

3. The following five points 

(0, 1), (1,7), (2, 27), (3,79), (4, 181) 

lie on a single cubic curve. 

(a) Which of the three types of cubic splines (natural, parabolic runout, or cubic runout) would agree exactly with the single 
cubic curve on which the five points lie? 

(b) Determine the cubic spline you chose in part (a), and verify that it is a single cubic curve that interpolates the five points. 
Answer: 



(a) The cubic runout spline 



(b) S(x) = 3x^^2x^'h5x'hl 

4. Repeat the calculations in Example 1 using a natural spline to interpolate the five data points. 
Answer: 



- .00000042(x + 10)^ + 

.00000024(x)^ - .0000126(1)2 ^ 

- .00000004(x-10)3 - .0000054(x-10)2 - 
.00000022(x-20)^ - . 0000066 (x - 20) ^ - 



.000214(x + 10) 

. 000088 (x) 
.000092(jr-10) 
.000212(x-20) 



+ 
+ 



.99815. 
.99987, 
.99973. 
.99823. 



-10<x<0 
0<x<10 
10<x<20 
20<x<30 



Maximum at (x. S(x)) = (3.93, 1.00004) 

5. Repeat the calculations in Example 1 using a cubic runout spline to interpolate the five data points. 



Answer: 



sr(x) = 



.00000Ci09(x 4-10)^ - .0000121(x + 10)"^ + .000282(x I 10) 



i000009(x)- 



- .0000093(x)' 



.00000004(x- 10)^ - .0000066(x- 10)^ - 
.00000004(x-20)2 - .0000053(x-20)^ - 



.000070(x) 
.000087(x- 10) 
.000207(x-20) 



+ .99815, -10<;c<0 

+ .99987, 0<x<10 

+ .99973, 10<x<20 

+ .99823. 20<x<30 



Maximum at (x,S(x)) = (4.00, 1.00001) 
6. Consider the five points (0, 0),(.5, 1),(1,0),(1.5, — 1), and (2, 0) on the graph oiy = sin(frx) • 

(a) Use a natural spline to interpolate the data points (0, 0), (.5, 1), and (1, 0). 

(b) Use a natural spline to interpolate the data points (.5, 1),(1,0), and (1.5, — 1 ) . 

(c) Explain the unusual nature of your result in part (b). 



Answer: 

(a) f _4jr3 J. 3_ 



0<x<0.5 



4x^-12x^ + 9x-l 0.5<x<l 



(b) 



„, , r2-2jr 0.5<x<l 
^('^ = {2-2x l<x<l.f 



(c) The three data points are coUinear. 

7. (The Periodic Spline) If it is known or if it is desired that the n points (xi,yi). (X2. ^2)« •••» i^n* >'») to be interpolated He 
on a single cycle of a periodic curve with period X|, — xj, then an interpolating cubic spline S{x) must satisfy 

S(xi)=S(x„) 
S'(xx)=S'(?:„) 
S"(xO=S"ix„) 

(a) Show that these three periodicity conditions require that 

Mi = if„ 



(b) Using the three equations in part (a) and Equations 15, construct an — 1) x — 1) hnear system for 
Ml, M2, M„_i in matrix form. 



Answer: 



(b) r 



4 


1 


0 


0 • 


• 0 


0 


0 


r 


Ml 






2yi 


+ 


y2 


1 


4 


1 


0 • 


• 0 


0 


0 


0 


M2 




yi 




+ 


y3 


0 


1 


4 


1 • 


• 0 


0 


0 


0 


M3 


6 


yi 


273 


+ 


y4 


0 


0 


0 


0 • 


• 0 


1 


4 


1 






yn-i 


- 27„_2 




yn-i 


1 


0 


0 


0 • 


• 0 


0 


1 


4 






yn-2 


- 27„_i 




yi 



8. (The Clamped Spline) Suppose that, in addition to the n points to be interpolated, we are given specific values yj and for 
the slopes S'\x\) and S'(xy2) of the interpolating cubic spline at the endpoints :^ 1 and 

(a) Show that 

(b) Using the equations in part (a) and Equations 15, construct an ^, x n linear system for Mj, M2, Myi in matrix form. 



Remark The clamped spline described in this exercise is the most accurate type of spline for interpolation work if the 
slopes at the endpoints are known or can be estimated. 

Answer: 



(b) 


"2 


1 


0 


0 • 


• 0 


0 


0 


1' 


" Ml 




- hy[ - 


71 


+ 


72 




1 


4 


1 


0 • 


• 0 


0 


0 


0 


M2 




y\ - 


272 


+ 


73 




0 


1 


4 


1 • 


• 0 


0 


0 


0 


M3 


6 


72 - 


273 


+ 


74 




0 


0 


0 


0 • 


• 0 


0 


4 


1 


M„_i 




yn-2 - 


27m-1 


+ 


7« 




0 


0 


0 


0 • 


• 0 


1 


1 


2 






yn-\ - 


yn 


1 


A7« 



Section 10.4 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 



Tl. In the solution of the natural cubic spline problem, it is necessary to solve a system of equations having coefficient matrix 

"4 1 0 ... 0 0 0" 

1 4 1 ... 0 0 0 



0 0 0 ... 1 4 1 

0 0 0 ... 0 1 4 



If we can present a formula for the inverse of this matrix, then the solution for the natural cubic spline problem can be easily 
obtained. In this exercise and the next, we use a computer to discover this formula. Toward this end, we first determine an 



expression for the determinant of i4„, denoted by the symbol D„. Given that 

Ai = [A] and ^2 = [^ ^] 

we see that 

Di = det(^l) = det[4]=4 

and 

D2 = det(J2)=clet|^j 4 j = 15 

(a) Use the cofactor expansion of determinants to show that 

for « = 3, 4, 5 This says, for example, that 

£)3 = 4£)2 - A = 4(15) - 4 = 56 
i?4 = 4D3 - £)2 = 4(56) - 15 = 209 

and so on. Using a computer, check this result for 5 < « < 10- 

(b) By writing 

D„ = 4D„-i-D„-2 

and the identity, Df,—\ = Df,—\, in matrix form, 



D„ ]U -1 



D„-2 



show that 



(c) Use the methods in Section 5.2 and a computer to show that 

1 {—^ n-1 f—^ n-2 







"4 -1 






'4 






1 0 


m 




1 



(2 + ^r' - (2 - /i)"-\2 - /j)-^-' - (2 + ^r' 



and hence 



2/3 



(2 + ,/3) -(2-,/3) 



forw = l,2, 3 

(d) Using a computer, check this result for 1 < « < 10- 

T2. In this exercise, we determine a formula for calculating from Dfc for fc = 0, 1, 2, 3, », assuming that Dq is defined 
to be 1. 

(a) Use a computer to compute for = 1 , 2, 3, 4, and 5. 



(b) From your results in part (a), discover the conjecture that 
where ^ij = ^ji and 



= [Of) 



for i 

(c) Use the result in part (b) to compute Jj^^ and compare it to the result obtained using the computer. 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.5 Markov Chains 



In this section we describe a general model of a system that changes from state to state. We then apply the model 
to several concrete problems. 



Prerequisites 

Linear Systems 
Matrices 

Intuitive Understanding of Limits 



A Markov Process 



Suppose a physical or mathematical system undergoes a process of change such that at any moment it can occupy 
one of a finite number of states. For example, the weather in a certain city could be in one of three possible 
states: sunny, cloudy, or rainy. Or an individual could be in one of four possible emotional states: happy, sad, 
angry, or apprehensive. Suppose that such a system changes with time from one state to another and at scheduled 
times the state of the system is observed. If the state of the system at any observation cannot be predicted with 
certainty, but the probability that a given state occurs can be predicted by just knowing the state of the system at 
the preceding observation, then the process of change is called a Markov chain or Markov process. 
r n 



DEFINITION 1 



If a Markov chain has ^possible states, which we label as 1, 2, ^, then the probability that the system 
is in state / at any observation after it was in state j at the preceding observation is denoted by Pij and is 
called the transition probability from state j to state /. The matrix P = [Pij] is called the transition 
matrix of the Markov chain. 

J 

For example, in a three-state Markov chain, the transition matrix has the form 

Preceding State 
1 2 3 

>ii Pn P\2 

P2\ P22 P23 

P3\ P32 P33 



1 

2 New State 
3 



In this matrix, P32 is the probability that the system will change from state 2 to state 3, Pu is the probability that 
the system will still be in state 1 if it was previously in state 1, and so forth. 



EXAMPLE 1 Transition Matrix of the Markov Chain M 



A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may rent a car 
from any of the three locations and return the car to any of the three locations. The manager finds 
that customers return the cars to the various locations according to the following probabilities: 

Rented from Location 



1 


2 


3 






8 


.3 


.2 


1 


Returned 


1 


.2 


.6 


2 


to 


1 


.5 


.2 


3 


Location 



This matrix is the transition matrix of the system considered as a Markov chain. From this matrix, 
the probability is that a car rented from location 3 will be returned to location 2, the probability 
is . 8 that a car rented from location 1 will be returned to location 1 , and so forth. 



EXAMPLE 2 Transition Matrix of the Markov Chain M 



By reviewing its donation records, the alumni office of a college finds that 80% of its alumni who 
contribute to the annual fund one year will also contribute the next year, and 30% of those who do 
not contribute one year will contribute the next. This can be viewed as a Markov chain with two 
states: state 1 corresponds to an alumnus giving a donation in any one year, and state 2 corresponds 
to the alumnus not giving a donation in that year. The transition matrix is 

.8 .3" 

.2 .7 



P = 



In the examples above, the transition matrices of the Markov chains have the property that the entries in any 
column sum to 1. This is not accidental. IfP = [Pij] is the transition matrix of any Markov chain with k states, 
then for each j we must have 

PlJ + P2j + - + PkJ=^ (1) 

because if the system is in state j at one observation, it is certain to be in one of the k possible states at the next 
observation. 

A matrix with property 1 is called a stochastic matrix, a probability matrix, or a Markov matrix. From the 
preceding discussion, it follows that the transition matrix for a Markov chain must be a stochastic matrix. 

In a Markov chain, the state of the system at any observation time cannot generally be determined with certainty. 
The best one can usually do is specify probabilities for each of the possible states. For example, in a Markov 
chain with three states, we might describe the possible state of the system at some observation time by a column 
vector 



x = 



^3 



in which x i is the probability that the system is in state 1, ^2 the probability that it is in state 2, and ^3 the 
probability that it is in state 3. In general we make the following definition. 

r 

DEFINITION 2 

The state vector for an observation of a Markov chain with k states is a column vector x whose ith 
component is the probability that the system is in the ith state at that time. 

Observe that the entries in any state vector for a Markov chain are nonnegative and have a sum of 1 . (Why?) A 
column vector that has this property is called a probability vector. 

Let us suppose now that we know the state vector for a Markov chain at some initial observation. The 
following theorem will enable us to determine the state vectors 



J}) V® 



at the subsequent observation times. 



THEOREM 10.5.1 



If P is the transition matrix of a Markov chain and ^C") is the state vector at the nth observation, then 



The proof of this theorem involves ideas from probability theory and will not be given here. From this theorem, 
it follows that 

In this way, the initial state vector x^'-'-' and the transition matrix P determine x*^"-* for « = 1 , 2, . . .. 



EXAMPLES Example 2 Revisited M 



The transition matrix in Example 2 was 



P = 



.8 .3 

.2 .7 



We now construct the probable future donation record of a new graduate who did not give a donation in 



initial year after graduation. For such a graduate 
vector is 



0 
1 



From Theorem 10.5.1 we then have 



.8 .3" 


'0' 




■.3" 




.2 .7_ 


_1_ 




.7_ 




.8 .3" 


".3" 




".45' 


.2 .7 


.7 




.55 



Thus, after three years the alumnus can be expected to make a donation with probability .525. Beyond thre 
years, we find the following state vectors (to three decimal places): 



.8 .3' 


".45" 




".525" 


.2 .7 


.55 




.475 



[.438j' 

^ .598 
[.402J' 



.581 
.419 

.599 
.401 



.591 
.409 

.599 
.401 



-1 
-[ 



For all n beyond 1 1 , we have 



(„)^r.6ooi 

[.400J 



to three decimal places. In other words, the state vectors converge to a fixed vector as the number of 
observations increases. (We will discuss this further below.) 



EXAMPLE 4 Example 1 Revisited A 

The transition matrix in Example 1 was 

.8 .3 .2 
.1 .2 .6 

.1 .5 .2 

If a car is rented initially from location 2, then the initial state vector is 

0" 



x® = 



Using this vector and Theorem 10.5.1, one obtains the later state vectors listed in Table 1. 

Table 1 



*<">\ 


U 


1 


z 




J 




O 


7 

f 


o 


Q 
V 


1 i\ 
1 If 


1 1 
1 1 




0 


.3(X) 


Am 


.477 


.311 


.533 


.544 


.550 


.553 


.553 


.556 


.557 




1 


.200 


.370 


.252 


.261 


.240 


.238 


.233 




.231 


.230 


.230 


4" 


0 


.500 


.230 


.271 


.228 


2^7 


.219 


.217 


.215 


.214 


.214 


.213 



For all values of n greater than 1 1, all state vectors are equal to x^^^^ to three decimal places. 

Two things should be observed in this example. First, it was not necessary to know how long a customer k( 
the car. That is, in a Markov process the time period between observations need not be regular. Second, the 
state vectors approach a fixed vector as n increases, just as in the first example. 



EXAMPLES Using Theorem 10.5.1 < 

A traffic officer is assigned to control the traffic at the eight intersections indicated in Figure 10.5.1. 
She is instructed to remain at each intersection for an hour and then to either remain at the same 
intersection or move to a neighboring intersection. To avoid establishing a pattern, she is told to 
choose her new intersection on a random basis, with each possible choice equally likely. For example, 
if she is at intersection 5, her next intersection can be 2, 4, 5, or 8, each with probability ^. Every day 

she starts at the location where she stopped the day before. The transition matrix for this Markov chain 
is 

Old Intersection 



1 


2 


3 


4 


5 


6 


7 


8 






1 


1 


0 


1 


0 


0 


0 


0 






3 


3 


5 














1 


1 


0 


0 


1 


0 


0 


0 






3 


3 






4 






0 


0 


1 

3 


1 

5 


0 


1 

3 


0 


0 


1 

2 




1 


0 


1 


1 


1 


0 


1 


0 


3 




3 




3 


^:i 


4 


4 


4 


New 


0 


1 


0 


1 


1 


0 


0 


1 


5 


Intersection 




3 




5 


4 






3 


6 




0 


0 


1 


0 


0 


1 


1 


0 


7 




3 








4 


8 




0 


0 


0 


1 


0 


1 


1 


1 












5 




3 


4 


3 






0 


0 


0 


0 


1 


0 


1 


1 






4 


4 


3 








6 7 8 

ni II \r 

Figure 10.5.1 

If the traffic officer begins at intersection 5, her probable locations, hour by hour, are given by the 
state vectors given in Table 2. For all values of n greater than 22, all state vectors are equal to x'-^^ to 
three decimal places. Thus, as with the first two examples, the state vectors approach a fixed vector as 
n increases. 

Table 2 





0 


1 


2 


3 


4 


5 


10 




20 


22 


■vi"' 


0 


.()()() 


.133 


.116 


.130 


.123 


.113 


.109 


.108 


.107 




0 


.250 


.146 


.163 


.140 


.138 


.115 


.109 


.108 


.107 




0 


.(ion 


.050 


.0.39 


.067 


.073 


.100 


.106 


.107 


.107 


•*4 


0 


.250 


.113 


.187 


.162 


.178 


.178 


.179 


.179 


.179 


*5 


1 


.250 


.279 


.190 


.190 


.168 


.149 


.144 


.143 


.143 




0 


.n<xi 


.000 


.050 


.056 


.074 


.099 


.105 


.107 


.107 




0 


.000 


.133 


.104 


.131 


.125 


.138 


.142 


.143 


.143 


4- 


0 


.250 


.146 


.152 


.124 


.121 


.108 


.107 


.107 


.107 



Limiting Beliavior of tlie State Vectors 

In our examples we saw that the state vectors approached some fixed vector as the number of observations 
increased. We now ask whether the state vectors always approach a fixed vector in a Markov chain. A simple 
example shows that this is not the case. 

EXAMPLE 6 System Oscillates Between Two State Vectors M 



Let 



P = 



0 1 

1 0 



and 



Then, because = / and = p, we have that 



and 



This system oscillates indefinitely between the two state vectors 
approach any fixed vector. 



[J] 




"0" 


and 




_1_ 



, so it does not 



However, if we impose a mild condition on the transition matrix, we can show that a fixed limiting state vector is 
approached. This condition is described by the following definition. 

r n 



DEFINITION 3 

A transition matrix is regular if some integer power of it has all positive entries. 

[ J 

Thus, for a regular transition matrix P, there is some positive integer m such that all entries of P^" are positive. 
This is the case with the transition matrices of Examples 1 and 2 for ^ = | . In Example 5 it turns out that has 
all positive entries. Consequently, in all three examples the transition matrices are regular. 

A Markov chain that is governed by a regular transition matrix is called a regular Markov chain. We will see 
that every regular Markov chain has a fixed state vector q such that --^ approaches ^ as ^ increases for any 
choice of x*^. This result is of major importance in the theory of Markov chains. It is based on the following 
theorem. 

■ o 



THEOREM 10,5.2 Behavior of as « _ cxj 



?2 



If P is a regular transition matrix, then as « ck)? 
where the <ii are positive numbers such that i + 2 + - - - + ? = ^ • 



We will not prove this theorem here. We refer you to a more specialized text, such as J. Kemeny and J. Snell, 
Finite Markov Chains (New York: Springer- Verlag, 1976). 



Let us set 



Q= 



Ik Ik 



Ik 



and 



q = 



Ik 



Thus, 2 is a transition matrix, all of whose columns are equal to the probability vector q. Q has the property that 
if jc is any probability vector, then 







<i\ ■ 


-■ <i\ 








+ 


^1^2 


+...+ 


1\^k 






<i2 ■ 


■■ q2 


^2 




<i2^\ 


+ 


«?2^2 


+...+ 


<}2^k 




Ik 


<ik • 


- <ik 


^k 




<ik^\ 


+ 


<lk^2 


+— + 


<^kXk 



= (xi+X2 + ... + Xk) 



12 

Ik 



= (l)q = q 



That is, Q transforms any probability vector x into the fixed probability vector q. This result leads to the 
following theorem. 



THEOREM 10.5.3 Behavior of P^x as oo 



If is a regular transition matrix and x is any probability vector, then as ^ — ► CK3? 

"^1 



?2 



= q 



where q is a fixed probability vector, independent of all of whose entries are positive. 



This result holds since Theorem 10.5.2 implies that ♦ ^ as ^ , x - This in turn implies that P^x — ► Qx = q 
as « — ► CO- Thus, for a regular Markov chain, the system eventually approaches a fixed state vector q. The vector 
q is called the steady-state vector of the regular Markov chain. 

For systems with many states, usually the most efficient technique of computing the steady-state vector q is 
simply to calculate P^^x. for some large n. Our examples illustrate this procedure. Each is a regular Markov 
process, so that convergence to a steady-state vector is ensured. Another way of computing the steady-state 
vector is to make use of the following theorem. 

Li 12 



THEOREM 10.5.4 Steady-State Vector 



The steady-state vector q of a regular transition matrix P is the unique probability vector that satisfies the 
equation Pq = q. 



To see this, consider the matrix identity pp^ = By Theorem 10.5.2, both and approach Q as 

n—¥OQ' Thus, we have PQ = Q. Any one column of this matrix equation gives f q = q. To show that q is the 
only probability vector that satisfies this equation, suppose r is another probability vector such that = r- Then 
also P^Y = r for ^ = 1, 2, .... When we let ^ qo, Theorem 10.5.3 leads to q = r. 



Theorem 10.5.4 can also be expressed by the statement that the homogeneous linear system 

(/-P)q=0 

has a unique solution vector q with nonnegative entries that satisfy the condition i^i + 1?2 + •- - + = ^ • We can 
apply this technique to the computation of the steady-state vectors for our examples. 

EXAMPLE 7 Example 2 Revisited A 

In Example 2 the transition matrix was 



P = 



.8 .3 

.2 7 



so the linear system (/ — /')q = 0 is 



.2 


-.3" 






0" 


-.2 


.3_ 


'?2 




_0_ 



(2) 



This leads to the single independent equation 

.2^1 3(^2 = 0 

or 

Thus, when we set <}2 = any solution of 2 is of the form 



where s is an arbitrary constant. To make the vector q a probability vector, we set 
s = 1 / (1.5 -I- 1) = .4. Consequently, 

".6" 

is the steady-state vector of this regular Markov chain. This means that over the long run, 60% of 
the alumni will give a donation in any one year, and 40% will not. Observe that this agrees with the 
result obtained numerically in Example 3. 



EXAMPLE 8 Example 1 Revisited < 

In Example 1 the transition matrix was 





".8 


.3 


.2 


p= 


.1 


.2 


.6 




.1 


.5 


.2 



SO the linear system (/ — P)q = 0 is 



.2 -3 -.2 
-.1 .8 -.6 
-.1 -.5 .8 



<12 



The reduced row echelon form of the coefficient matrix is (verify) 

1 



0, -il 

0 0 0 

so the original linear system is equivalent to the system 



<i2 



When we set J 3 = ff, any solution of the linear system is of the form 

34 



13 

il 
13 

1 



To make this a probability vector, we set 

s=- 



1 



^ + ^+1 
13 13 



11 
61 



Thus, the steady-state vector of the system is 

34 
61 
14 



q= 



61 

il 
61 



.5573... 
.2295... 
.2131... 



This agrees with the result obtained numerically in Table 1 . The entries of q give the long-run 
probabilities that any one car will be returned to location 1, 2, or 3, respectively. If the car rental 
agency has a fleet of 1000 cars, it should design its facilities so that there are at least 558 spaces at 
location 1, at least 230 spaces at location 2, and at least 214 spaces at location 3. 



EXAMPLES Example 5 Revisited < 



We will not give the details of the calculations but simply state that the unique probability vector 
solution of the linear system (/ — P) q = 0 is 



q= 



3 












28 






3 




".1071... 


28 




.1071... 


5 




.1071... 


28 




.1785... 


4 




.1428... 


28 




.1071... 


D 




.1428... 






.1071... 


4 






28 




3 




28 





The entries in this vector indicate the proportion of time the traffic officer spends at each 
intersection over the long term. Thus, if the objective is for her to spend the same proportion of 
time at each intersection, then the strategy of random movement with equal probabilities from one 
intersection to another is not a good one. (See Exercise 5.) 



Exercise Set 10.5 

1. Consider the transition matrix 



A .5 
.6 .5 



Calculate for « = 1, 2, 3, 4, 5 if = 
(b) State why P is regular and find its steady-state vector. 
Answer: 



(a) ^(1) _ 



'A 




".46" 




"454" 




" 4546" 




" 45454" 


.6 




.54 




.546 




.5454 




.54546 



(b) 



P is regular since all entries of P are positive; q = 



11 

_6_ 
11 



2. Consider the transition matrix 



P = 



.2 .1 .7 
.6 .4 .2 
.2 .5 .1 



(a) Calculate x^^^? ? to three decimal places if 



(b) State why P is regular and find its steady-state vector. 
Answer: 
(a) 



'7" 


, x® = 


".23" 


. xC^ = 


".273" 


.2 


.52 


.396 


.1 




.25 




.331 



(b) 



P is regular, since all entries of P are positive: q = 



22 
72 

29 
72 

li 
72 



3. Find the steady-state vectors of the following regular transition matrices: 



(a) 



1 1 
3 4 

2 1 

3 4 



(b) r.81 .26] 
[.19 .74 J 



(c) 



i i 0 

3 



ill 
3 2 4 



Answer: 

(a) 9_ 
17 
8_ 
17 



(b) 



45 

11 
45 



(c) J_ 
19 

4_ 

19 

19 

4. Let P be the transition matrix 



(a) Show that P is not regular. 

Show that as n increases, P"x® approaches 



for any initial state vector x®. 
(c) What conclusion of Theorem 10.5.3 is not valid for the steady state of this transition matrix? 



Answer: 



(a) 



-(if 



, « = 1, 2, — Thus, no integer power of P has all positive entries. 



(b) 
(c) 



1 1 " increases, so — * K for any as n increases. 
The entries of the limiting vector j are not all positive. 



5. Verify that if is a x it regular transition matrix all of whose row simis are equal to 1 , then the entries of its 
steady-state vector are all equal to 1 / /t- 

6. Show that the transition matrix 



P = 



0 i i 
2 2 



1 i 0 

2 2 



is regular, and use Exercise 5 to find its steady-state vector. 
Answer: 



p2= 



1 


1 


1 


2 


4 


4 


1 


1 


1 


4 


2 


4 


1 


1 


1 


4 


4 


2 



has all positive entries; q = 



7. John is either happy or sad. If he is happy one day, then he is happy the next day four times out of five. If he 
is sad one day, then he is sad the next day one time out of three. Over the long term, what are the chances that 
John is happy on any given day? 

Answer: 

10 

13 

8. A country is divided into three demographic regions. It is found that each year 5% of the residents of region 1 
move to region 2, and 5% move to region 3. Of the residents of region 2, 15% move to region 1 and 10% 
move to region 3. And of the residents of region 3, 10% move to region 1 and 5% move to region 2. What 
percentage of the population resides in each of the three regions after a long period of time? 

Answer: 



12 1 
54-7% in region 1, 16-^% in region 2, and 29-7% region 3 

6 3 6 



Section 10.5 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 



Tl. Consider the sequence of transition matrices 

{P2,P2,Pa,...) 

with 



Pi 



0 ^ 



1 4 



^3 



0 0^ 

0 1 1 

2 3 

1 i i 

2 3 



^^4 



0 0 0 ^ 

i 

4 
i 
4 



0 1 1 

2 3 

1 i 1 
2 3 



0 0 0 



0 0 0 4:^ 



i i 
4 5 

1 i i 

4 5 

1 

4 5 



0 0 j j 4 

0 i i I 

2 3 

1 i i i i 

2 3 4 5 



and so on. 

(a) Use a computer to show that each of these four matrices is regular by computing their squares. 

(b) Verify Theorem 10.5.2 by computing the 100th power of for = 2, 3, 4, 5. Then make a conjecture as to 
the limiting value of as ^ qq for all = 2, 3, 4, . . . . 

(c) Verify that the common column q.ic of the limiting matrix you found in part (b) satisfies the equation 
P^v^i^ = qj^, as required by Theorem 10.5.4. 

T2. A mouse is placed in a box with nine rooms as shown in the accompanying figure. Assume that it is equally 
likely that the mouse goes through any door in the room or stays in the room. 

(a) Construct the 9 x 9 transition matrix for this problem and show that it is regular. 

(b) Determine the steady-state vector for the matrix. 

(c) Use a symmetry argument to show that this problem may be solved using only a 3 x 3 matrix. 



1 



j~n r 



c 



Figure Ex-T2 
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10.6 Graph Theory 

In this section we introduce matrix representations of relations among members of a set. We use matrix 
arithmetic to analyze these relationships. 

□ 



Prerequisites 

Matrix Addition and Multiplication 



Relations Among Members of a Set 

There are countless examples of sets with finitely many members in which some relation exists among 
members of the set. For example, the set could consist of a collection of people, animals, countries, 
companies, sports teams, or cities; and the relation between two members, A and B, of such a set could be that 
person A dominates person B, animal A feeds on animal B, country A militarily supports country 5, company 
A sells its product to company 5, sports team A consistently beats sports team B, or city A has a direct airline 
flight to city B. 

We will now show how the theory of directed graphs can be used to mathematically model relations such as 
those in the preceding examples. 



Directed Graphs 

A directed graph is a flnite set of elements, {P\, P2, Pyi) -> together with a flnite collection of ordered 
pairs {Pu Pj) of distinct elements of this set, with no ordered pair being repeated. The elements of the set are 
called vertices, and the ordered pairs are called directed edges, of the directed graph. We use the notation 
Pi — ' Pj (which is read is connected to Pv") to indicate that the directed edge {Pi, P{} belongs to the 
directed graph. Geometrically, we can visualize a directed graph (Figure 10.6.1) by representing the vertices 
as points in the plane and representing the directed edge — > P^ by drawing a line or arc from vertex to 
vertex Pj, with an arrow pointing from Pj to Py . If both Pj — ► P/ and Pj — ► Pj hold (denoted Pj «=> Pj) , we 
draw a single line between Pj and Pj with two oppositely pointing arrows (as with P2 and P3 in the flgure). 




Figure 10.6.1 



As in Figure 10.6.1, for example, a directed graph may have separate "components" of vertices that are 
connected only among themselves; and some vertices, such as p^, may not be connected with any other 
vertex. Also, because F\ • F\ is not permitted in a directed graph, a vertex cannot be connected with itself by 
a single arc that does not pass through any other vertex. 



Figure 10.6.2 shows diagrams representing three more examples of directed graphs. With a directed graph 
having n vertices, we may associate ^n^xn matrix M = [Wy ] , called the vertex matrix of the directed 
graph. Its elements are defined by 



otherwise 



for /, j = \,2, ». For the three directed graphs in Figure 10.6.2, the corresponding vertex matrices are 

"O 1 0 o" 



Figure a: 



Figure b: 



Figure c: 



M = 



M = 



M = 



0 0 10 
0 10 1 
0 0 0 0 

0 10 0 1 

0 0 110 

0 0 0 1 0 

0 10 0 1 

0 110 0 

0 10 0 
10 10 
10 0 1 
10 0 0 



-< — 



*P4 





By their definition, vertex matrices have the following two properties: 

(i) All entries are either 0 or 1 . 

(ii) All diagonal entries are 0. 

Conversely, any matrix with these two properties determines a unique directed graph having the given matrix 
as its vertex matrix. For example, the matrix 



M = 



0 


1 


1 


0 


0 


0 


1 


0 


1 


0 


0 


1 


0 


0 


0 


0 



determines the directed graph in Figure 10.6.3. 




Figure 10.6.3 



EXAMPLE 1 Influences Within a Family M 

A certain family consists of a mother, father, daughter, and two sons. The family members have 
influence, or power, over each other in the following ways: the mother can influence the 
daughter and the oldest son; the father can influence the two sons; the daughter can influence 
the father; the oldest son can influence the youngest son; and the youngest son can influence the 
mother. We may model this family influence pattern with a directed graph whose vertices are 
the flve family members. If family member A influences family member B, we write ^4 • ^• 
Figure 10.6.4 is the resulting directed graph, where we have used obvious letter designations for 
the flve family members. The vertex matrix of this directed graph is 

MFDOSYS 

Olio' 

0 0 11 

1 0 0 0 
0 0 0 1 
0 0 0 0 



M 


0 


F 


0 


D 


0 


OS 


0 


YS 


1 





Figure 10.6.4 



EXAMPLE 2 Vertex Matrix: Moves on a Chessboard M 



In chess the knight moves in an "L"-shaped pattern about the chessboard. For the board in 
Figure 10.6.5 it may move horizontally two squares and then vertically one square, or it may 
move vertically two squares and then horizontally one square. Thus, from the center square in 
the flgure, the knight may move to any of the eight marked shaded squares. Suppose that the 
knight is restricted to the nine numbered squares in Figure 10.6.6. If by i - j we mean that the 
knight may move from square i to square 7, the directed graph in Figure 10.6.7 illustrates all 



possible moves that the knight may make among these nine squares. In Figure 10.6.8 we have 
"unraveled" Figure 10.6.7 to make the pattern of possible moves clearer. 



The vertex matrix of this directed graph is given by 



0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 



Figure 10.6.5 



1 


2 


} 


4 


5 


6 


7 


8 


9 



Figure 10.6.6 
1 2 3 




7 8 9 

Figure 10.6.7 



2 

Figure 10.6.8 



In Example 1 the father cannot directly influence the mother; that i^,F—^M is not true. But he can influence 

the youngest son, who can then influence the mother. We write this F —^YS^ M ^nd call it a 2-step 

connection from F to M. Analogously, we call M . F) a 1-step connection, F • OS • YS • M ^ 3-step 

connection, and so forth. Let us now consider a technique for flnding the number of all possible r-step 

connections (r= 1, 2, ...) from one vertex to another vertex Pj of an arbitrary directed graph. (This will 

include the case when Pj and Pj are the same vertex.) The number of 1-step connections from P^ to Pj is 

simply ^2j. That is, there is either zero or one 1-step connection from Pj to Pj, depending on whether ^ij is 

(2) 

zero or one. For the number of 2-step connections, we consider the square of the vertex matrix. If we let ^n-y' 
be the (i, ^)-th element of we have 

(2D 

m^j =Wj1W1j +Wi2W2; + (1) 

Now, if =m\j = \, there is a 2-step connection Pj — ► Pi — ► Pj from pj to Pj. But if either or^lj is 
zero, such a 2-step connection is not possible. Thus Pj — ^ Pi — ► Pj is a 2-step connection if and only if 

1^ 1; ~ • Similarly, for any k= 1 , 2, Pj Pjt ^/ is a 2-step connection from P^ to Pj if and 
only if the term ^ik^kj on the right side of 1 is one; otherwise, the term is zero. Thus, the right side ofl is 
the total number of two 2-step connections from Pj to Pj. 

A similar argument will work for flnding the number of 3 — , 4 — , r-step connections from p\ to Pj. In 
general, we have the following result. 

U El 



THEOREM 10.6.1 

Let M be the vertex matrix of a directed graph and let be the (i, J) -th element of M^. Then ^^y^ 
is equal to the number of r-step connections from Pj to Pj. 



EXAMPLES Using Theorem 10.6.1 M 



Figure 10.6.9 is the route map of a small airline that services the four cities Pj, P3, P4. As 
a directed graph, its vertex matrix is 



We have that 



M2 = 



and 



3 
2 
0 
3 



If we are interested in connections from city P4 to city P3, we may use Theorem 10.6.1 to find 
their number. Because — 1, there is one 1-step connection; because ^® = ], there is one 

2-step connection; and because — 3, there are three 3-step connections. To verify this, 

from Figure 10.6.9 we find 

1- step connections from P/^ to Pj, 

2- step connections from P/^ to 

3- step connections from P^ to P^ 



Pa- 
Pa- 
Pa- 
Pa- 
Pa- 



P3 
Pi 

P2 

>P2 



P2 

Pi- 
Pi- 



.P3 
.^3 




'*4 

Figure 10.6.9 



Cliques 

In everyday language a "clique" is a closely knit group of people (usually three or more) that tends to 
communicate within itself and has no place for outsiders. In graph theory this concept is given a more precise 
meaning. 

r n 



DEFINITION 1 



A subset of a directed graph is called a clique if it satisfies the following three conditions: 

(i) The subset contains at least three vertices. 

(ii) For each pair of vertices and in the subset, both Pi — ► Pj and P^ — > Pi are true. 

(iii) The subset is as large as possible; that is, it is not possible to add another vertex to the subset and 
still satisfy condition (ii). 

L J 



This definition suggests that cliques are maximal subsets that are in perfect "communication" with each other. 
For example, if the vertices represent cities, and Pi —^Pj means that there is a direct airline fiight from city 
Pi to city Pi , then there is a direct flight between any two cities within a clique in either direction. 

EXAMPLE 4 A Directed Graph with Two Cliques < 

The directed graph illustrated in Figure 10.6.10 (which might represent the route map of an 
airline) has two cliques: 

{PuP2,P2,Pa) and {P2,Pa,P6) 

This example shows that a directed graph may contain several cliques and that a vertex may 
simultaneously belong to more than one clique. 



P5 




Figure 10.6.10 



For simple directed graphs, cliques can be found by inspection. But for large directed graphs, it would be 
desirable to have a systematic procedure for detecting cliques. For this purpose, it will be helpful to deflne a 
matrix [si^] related to a given directed graph as follows: 




0, otherwise 



The matrix S determines a directed graph that is the same as the given directed graph, with the exception that 
the directed edges with only one arrow are deleted. For example, if the original directed graph is given by 
Figure 10.6.11a, the directed graph that has S as its vertex matrix is given in Figure 10.6. IIZ?. The matrix S 
may be obtained from the vertex matrix M of the original directed graph by setting Sjj = 1 if m^j = mp = 1 
and setting ffy = 0 otherwise. 




Figure 10.6.11 



The following theorem, which uses the matrix S, is helpful for identifying cliques. 
THEOREM 10.6.2 Identifying Cliques 

Let be the (i, J)-th element of S'-^. Then a vertex belongs to some clique if and only if 9t 0- 

Proof If 9i: 0' ^hen there is at least one 3 -step connection from to itself in the modified directed graph 
determined by S. Suppose it is Pj — ► Pj — ^ P/^ — ► Pi. In the modified directed graph, all directed relations are 
two-way, so we also have the connections Pj ^ Pj • • Pk. • • f}- But this means that {Pj, Pj, P^ ) is either a 
clique or a subset of a clique. In either case, P^ must belong to some clique. The converse statement, "if Pj 
belongs to a clique, then ^ q," follows in a similar manner. 



EXAMPLES Using Theorem 10.6.2 < 



Suppose that a directed graph has as its vertex matrix 

0 1 1 



Then 



M = 



0 


1 


0 


1 


1 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 



1 0 1 

0 1 0 

1 0 0 



and 



0 


3 


0 


2 


3 


0 


2 


0 


0 


2 


0 


1 


2 


0 


1 


0 



Because all diagonal entries of are zero, it follows from Theorem 10.6.2 that the directed 
graph has no cliques. 



EXAMPLES Using Theorem 10.6.2 < 

Suppose that a directed graph has as its vertex matrix 



0 


1 


0 


1 


1 


1 


0 


0 


1 


0 


1 


1 


0 


1 


0 


1 


1 


0 


0 


0 


1 


0 


0 


1 


0 



Then 



0 


1 


0 


1 


1 






2 


4 


0 


4 


3 


1 


0 


0 


1 


0 






4 


2 


0 


3 


1 


0 


0 


0 


0 


0 


and 




0 


0 


0 


0 


0 


1 


1 


0 


0 


0 






4 


3 


0 


2 


1 


1 


0 


0 


0 


0 






3 


1 


0 


1 


0 



The nonzero diagonal entries of ^-^ are J^, g.^, and J^. Consequently, in the given directed 

graph. Pi, P2, and P4 belong to cliques. Because a clique must contain at least three vertices, 
the directed graph has only one clique, {Pi, P2, P4) . 



Dominance-Directed Graphs 

In many groups of individuals or animals, there is a definite "pecking order" or dominance relation between 
any two members of the group. That is, given any two individuals A and B, either^ dominates B or B 



dominates A, but not both. In terms of a directed graph in which Pj Pj means dominates Pj, this means 
that for all distinct pairs, either Fj — ► Pj or Pj — ► Pj, but not both. In general, we have the following 
definition. 

r n 



DEFINITION 2 

A dominance-directed graph is a directed graph such that for any distinct pair of vertices Pj and Pj, 
either Pi — ► Pj or Pj — ► Pi, but not both. 

L J 



An example of a directed graph satisfying this definition is a league of n sports teams that play each other 
exactly one time, as in one round of a round-robin tournament in which no ties are allowed. If Pi • Pj means 
that team P^ beat team Pj in their single match, it is easy to see that the definition of a dominance-directed 
group is satisfied. For this reason, dominance-directed graphs are sometimes called tournaments. 

Figure 10.6.12 illustrates some dominance-directed graphs with three, four, and five vertices, respectively. In 
these three graphs, the circled vertices have the following interesting property: from each one there is either a 
1-step or a 2-step connection to any other vertex in its graph. In a sports tournament, these vertices would 
correspond to the most "powerful" teams in the sense that these teams either beat any given team or beat 
some other team that beat the given team. We can now state and prove a theorem that guarantees that any 
dominance-directed graph has at least one vertex with this property. 

a □ 



THEOREM 10.6.3 Connections in Dominance-Directed Graphs 

In any dominance-directed graph, there is at least one vertex from which there is a 1-step or 2-step 
connection to any other vertex. 

3 u 



Proof Consider a vertex (there may be several) with the largest total number of 1-step and 2-step 
connections to other vertices in the graph. By renumbering the vertices, we may assume that P^ is such a 
vertex. Suppose there is some vertex P^ such that there is no 1-step or 2-step connection from f\ to Py Then, 
in particular, P^ ♦ P^ is not true, so that by definition of a dominance-directed graph, it must be that 
Pi — ► Pi . Next, let Pj^ be any vertex such that P^ • is true. Then we cannot have P^ — > P^, as then 
P^—^Pj^—^ P^ would be a 2-step connection from P^ to P^. Thus, it must be that Pj — ► P^. That is, P^ has 

1- step connections to all the vertices to which P^ has 1-step connections. The vertex Pj must then also have 

2- step connections to all the vertices to which Pi has 2-step connections. But because, in addition, we have 
that Fj 'Pi, this means that P^ has more 1-step and 2-step connections to other vertices than does Pi . 
However, this contradicts the way in which Pi was chosen. Hence, there can be no vertex P^ to which Pi has 
no 1-step or 2-step connection. 



P2 




Figure 10.6.12 

This proof shows that a vertex with the largest total number of 1-step and 2-step connections to other vertices 
has the property stated in the theorem. There is a simple way of finding such vertices using the vertex matrix 
Mand its square M^- The sum of the entries in the /th row of Mis the total number of 1-step connections 
from to other vertices, and the sum of the entries of the /th row oi is the total number of 2-step 
connections from to other vertices. Consequently, the sum of the entries of the /th row of the matrix 
^ = il/ + is the total number of 1-step and 2-step connections from to other vertices. In other words, 
a row oi A = M ^ with the largest row sum identifies a vertex having the property stated in Theorem 
10.6.3. 

EXAMPLE 7 Using Theorem 10.6.3 < 

Suppose that five baseball teams play each other exactly once, and the results are as indicated in 
the dominance-directed graph of Figure 10.6.13. The vertex matrix of the graph is 



M = 



0 


0 


1 


1 


0 


1 


0 


1 


0 


1 


0 


0 


0 


1 


0 


0 


1 


0 


0 


0 


1 


0 


1 


1 


0 



so 



0 


0 


1 


1 


0 




0 


1 


0 


1 


0 




0 


1 


1 


2 


0 


1 


0 


1 


0 


1 




1 


0 


2 


3 


0 




2 


0 


3 


3 


1 


0 


0 


0 


1 


0 


+ 


0 


1 


0 


0 


0 




0 


1 


0 


1 


0 


0 


1 


0 


0 


0 




1 


0 


1 


0 


1 




1 


1 


1 


0 


1 


1 


0 


1 


1 


0 




0 


1 


1 


2 


0 




1 


1 


2 


3 


0 



A = M + M^ = 



The row sums of A are 

1 St row sum = 4 

2 nd row sum = 9 

3 rd row sum = 2 

4 th row sum = 4 

5 th row sum = 7 

Because the second row has the largest row sum, the vertex P2 must have a 1-step or 2-step 
connection to any other vertex. This is easily verified from Figure 10.6.13. 




Figure 10.6.13 



We have informally suggested that a vertex with the largest number of 1-step and 2-step connections to other 

vertices is a "powerful" vertex. We can formalize this concept with the following definition. 

r n 



DEFINITION 3 



The power of a vertex of a dominance-directed graph is the total number of 1-step and 2-step 
connections from it to other vertices. Alternatively, the power of a vertex is the sum of the entries 
of the /th row of the matrix A = M + M^-> where M is the vertex matrix of the directed graph. 



L 



J 



EXAMPLE 8 Example 7 Revisited < 



Let us rank the five baseball teams in Example 7 according to their powers. From the 
calculations for the row sums in that example, we have 

Power of team P\=4 

Power of team P2 = 9 

Power of team = 2 

Power of team = 4 

Power of team P5 = l 
Hence, the ranking of the teams according to their powers would be 

P2 (first), P^ (second). Pi and P/^ (tied for third), P2 (last) 

Exercise Set 10.6 

1. Construct the vertex matrix for each of the directed graphs illustrated in Figure Ex-1. 




Pi 




Figure Ex-1 



Answer: 



0 


0 


0 


1 

1 






1 


0 


1 


1 






1 


1 


0 


1 






0 


0 


0 


0 






0 


■1 

1 


1 


0 


IJ 




0 


0 


0 


n 


1 

1 




1 


0 


0 


1 


0 




0 


0 


1 


0 


0 




0 


0 


1 


0 


0 




0 


1 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


1 


1 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0 


1 


0 



2. Draw a diagram of the directed graph corresponding to each of the following vertex matrices. 



(c) 



0 


1 


1 


0 






1 


0 


0 


0 






0 


0 


0 


1 






1 


0 


1 


0 






0 


0 


1 


0 


o" 




1 


0 


0 


0 


1 




0 


1 


0 


1 


1 




0 


0 


0 


0 


0 




1 


1 


1 


0 


0 




0 


1 


0 


1 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 


0 



Answer: 





3. Let M be the following vertex matrix of a directed graph: 



0 111 
10 0 0 
0 10 1 
0 110 



(a) Draw a diagram of the directed graph. 

(b) Use Theorem 10.6.1 to find the number of 1-, 2-,and 3-step connections from the vertex to the 
vertex P2- Verify your answer by listing the various connections as in Example 3. 

(c) Repeat part (b) for the 1-, 2-, and 3-step connections from Pi to P4. 



(b) 1 - step: Pi P2 

2 - step: Pi-^P4-*P2 

Pl^P2^P2 

3- step: Pi^P2^Pi^P2 

Pl^P^^P^^P2 
Pl-.P4^P^^P2 

(c) 1-step: Pi^Pa 

2 - step: Pi-^P^-^Pa 

3- step: Pi-^P2-^Pi-^P4 

Pl^P^^P^^P^ 

• (a) Compute the matrix product M '^M for the vertex matrix M in Example 1 . 



Answer: 




(b) Verify that the kth diagonal entry of M ^ M is the number of family members who influence the Ath 
family member. Why is this true? 

(c) Find a similar interpretation for the values of the nondiagonal entries of M^M- 

Answer: 



(a) 



1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


2 


1 


0 


0 


0 


1 


2 



(c) The i Jth entry is the number of family members who influence both the ith and jth family members. 
5. By inspection, locate all cliques in each of the directed graphs illustrated in Figure Ex-5. 




Figure Ex-5 

Answer: 

(a) {^1,^2,^3} 

(b) {P3.PA.P5) 

(c) {P2.P4.P6.P2) and {P4,/^5,f^6} 

6. For each of the following vertex matrices, use Theorem 10.6.2 to find all cliques in the corresponding 
directed graphs. 



0 


1 


0 


1 


0 




1 


IJ 


-1 

1 




1 




n 
*j 


1 

1 


0 


1 

1 


1 

i 




1 
1 


0 


0 


n 


1 

1 




1 
1 




1 

1 


1 

i 






0 


1 


0 


1 


1 


0 


1 


0 


1 


0 


1 


1 


0 


1 


0 


1 


0 


1 


1 


0 


1 


0 


1 


1 


0 


1 


0 


1 


0 


0 


0 


0 


1 


1 


1 


0 



Answer: 

(a) None 

(b) {P3,P4.P6) 

7. For the dominance-directed graph illustrated in Figure Ex-7 construct the vertex matrix and find the power 
of each vertex. 

Pi 




Figure Ex-7 



Answer: 

0 0 1 -[1 ^ower o£ Pi = 5 

1 0 0 0 Powerof/*2 = 3 
0 10 1 Power of ^3 = 4 
0 1 0 oj Powerof/'4 = 2 

8. Five baseball teams play each other one time with the following results: 

Cheats B, C,D 
5 beats C,E 

Cheats A £ 
D beats 5 

E beats A, D 

Rank the five baseball teams in accordance with the powers of the vertices they correspond to in the 
dominance-directed graph representing the outcomes of the games. 



Answer: 



First, A; second, B and E (tie); fourth, C; fifth, D 

Section 10.6 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. A graph having n vertices such that every vertex is connected to every other vertex has a vertex matrix 
given by 



0 1111 
10 111 
110 11 
1110 1 
11110 



11111 



In this problem we develop a formula for whose (i, j)-th entry equals the number of A:-step connections 
from/'j toPj. 

(a) Use a computer to compute the eight matrices for « = 2, 3 and for = 2, 3, 4, 5. 

(b) Use the results in part (a) and symmetry arguments to show that can be written as 





"o 


1 1 


1 


1 ... 


r 






1 


0 1 


1 


1 ... 


1 






1 


1 0 


1 


1 ... 


1 




— 


1 


1 1 


0 


1 ... 


1 






1 


1 1 


1 


0 ... 


1 






1 


1 1 


1 


1 ... 


0 




"A; 


0k 


0k 


0k 


0k 




^k 




otk 


0k 


0k 


0k 




?k 




0k 


Oik 


0k 


0k 




?k 


0k 


0k 


0k 


otk 


0k 




^k 


0k 


0k 


0k 


0k 


o^k 




^k 


0k 


0k 


0k 


0k 


0k 




otk 



(c) Using the fact that Jif* = M„M^~^ ^ show that 



with 



(d) Using part (c), show that 



(e) Use the methods of Section 5.2 to compute 







"0 


k-l 


"0" 


'k 




1 «-2 




_1_ 



0 n-\ 

1 n-2 

and thereby obtain expressions for ajt and jj}^, and eventually show that 



-ik-l 



where Uj^ is the « x » niatrix all of whose entries are ones and /„ is the » x « identity matrix, 
(f) Show that for w > 2? all vertices for these directed graphs belong to cliques. 

T2. Consider a round-robin tournament among n players (labeled ^3 ^n) where beats 32, '^'2 

beats l5^3, cx^ beats ^4 ^*m— 1 beats and beats Compute the "power" of each player, showing that 

they all have the same power; then determine that common power. 

[Hint: Use a computer to study the cases « = 3, 4, 5, 6; then make a conjecture and prove your conjecture to 
be true.] 
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10.7 Games of Strategy 

In this section we discuss a general game in which two competing players choose separate strategies to reach opposing 
objectives. The optimal strategy of each player is found in certain cases with the use of matrix techniques. 



Prerequisites 

Matrix Multiplication 
Basic Probability Concepts 



Game Theory 

To introduce the basic concepts in the theory of games, we will consider the following carnival-type game that two 
people agree to play. We will call the participants in the game player R and player C. Each player has a stationary wheel 
with a movable pointer on it as in Figure 10.7.1. For reasons that will become clear, we will call player 7^'s wheel the 
row-wheel and player C's wheel the column-wheel. The row- wheel is divided into three sectors numbered 1, 2, and 3, 
and the column- wheel is divided into four sectors numbered 1, 2, 3, and 4. The fractions of the area occupied by the 
various sectors are indicated in the figure. To play the game, each player spins the pointer of his or her wheel and lets it 
come to rest at random. The number of the sector in which each pointer comes to rest is called the move of that player. 
Thus, player R has three possible moves and player C has four possible moves. Depending on the move each player 
makes, player C then makes a payment of money to player R according to Table 1 . 



X 2 y 
I — -w- — 


I \ 










i/r*- ^ 





Row -wheel 
of player/? 



1/4 v-"^ 


1 \ 


4 


\^ ^ \ / 

1/3^*-- ^ 


16 



Colunui- wheel 
of player C 

Figure 10.7.1 



Table 1 





PlaviT ( *s Move 


1 


2 


3 


4 


Player i?'s 
Move 


1 


S3 


S5 


-S2 


-SI 


2 


-$2 


S4 


^3 


-$4 


3 


S6 


-S5 


$() 


$3 



For example, if the row- wheel pointer comes to rest in sector 1 (player R makes move 1), and the column- wheel pointer 
comes to rest in sector 2 (player C makes move 2), then player C must pay player R the sum of $5. Some of the entries 
in this table are negative, indicating that player C makes a negative payment to player R. By this we mean that player R 
makes a positive payment to player C. For example, if the row- wheel shows 2 and the column-wheel shows 4, then 
player R pays player C the sum of $4, because the corresponding entry in the table is -$4. In this way the positive entries 
of the table are the gains of player R and the losses of player C, and the negative entries are the gains of player C and the 
losses of player 7?. 

In this game the players have no control over their moves; each move is determined by chance. However, if each player 
can decide whether he or she wants to play, then each would want to know how much he or she can expect to win or lose 
over the long term if he or she chooses to play. (Later in the section we will discuss this question and also consider a 
more complicated situation in which the players can exercise some control over their moves by varying the sectors of 
their wheels.) 



Two-Person Zero-Sum Matrix Games 

The game described above is an example of a two-person zero-sum matrix game. The term zero-sum means that in each 
play of the game, the positive gain of one player is equal to the negative gain (loss) of the other player. That is, the sum 
of the two gains is zero. The term matrix game is used to describe a two-person game in which each player has only a 
finite number of moves, so that all possible outcomes of each play, and the corresponding gains of the players, can be 
displayed in tabular or matrix form, as in Table 1 . 

In a general game of this type, let player R have m possible moves and let player C have n possible moves. In a play of 
the game, each player makes one of his or her possible moves, and then a payoff made from player C to player R, 
depending on the moves. For j = 1, 2, and j = 1, 2, let us set 

a^^ = payoff that player C makes to player R if player R 

makes move i and player C makes move j 

This payoff need not be money; it can be any type of commodity to which we can attach a numerical value. As before, if 
an entry ^3/ is negative, we mean that player C receives a payoff of \ from player R. We arrange these mn possible 
payoffs in the form of an ^ x « matrix 





'an 


^12 ■ 




A = 




«22 - 


■■ <^2n 






<^m2 ■ 





which we will call the payoff matrix of the game. 

Each player is to make his or her moves on a probabilistic basis. For example, for the game discussed in the 



introduction, the ratio of the area of a sector to the area of the wheel would be the probability that the player makes the 
move corresponding to that sector. Thus, from Figure 10.7.1, we see that player R would make move 2 with probability 
and player C would make move 2 with probability ^. In the general case we make the following definitions: 



= probability that player R makes move i 
= probability that player C makes move J 

It follows from these definitions that 

PI+P2+ • • • ^Pm=^ 

and 

91+^2+ ■ ■ ■ = ! 

With the probabilities Pi and 9 J we form two vectors: 



(i = 1, 2, 

0 = 1,2 n) 



9=[P\ P2 



Pm] 



and 



We call the row vector p the strategy of player R and the column vector q the strategy of player C. For example, from 
Figure 10.7.1 we have 



and 



q= 



for the carnival game described earlier. 

From the theory of probability, if the probability that player R makes move / is Pi, and independently the probability that 
player C makes move j is ^ ? , then Pi^j is the probability that for any one play of the game, player R makes move / and 
player C makes move j. The payoff to player R for such a pair of moves is ''y . If we multiply each possible payoff by its 
corresponding probability and sum over all possible payoffs, we obtain the expression 



aiipiqi \ ai2P\q2'^ — '^<^\nP\qn 1 <^2\P2<i\ 



* ^mnPm^n 



(1) 



Equation 1 is a weighted average of the payoffs to player R\ each payoff is weighted according to the probability of its 
occurrence. In the theory of probability, this weighted average is called the expected payoff to player R. It can be shown 
that if the game is played many times, the long-term average payoff per play to player R is given by this expression. We 
denote this expected payoff by ^(p, q) to emphasize the fact that it depends on the strategies of the two players. From 
the definition of the payoff matrix A and the strategies p and q, it can be verified that we may express the expected 
payoff in matrix notation as 



5Cp.q) = [;?i P2 - Pm] 



^11 

^21 



^12 
^22 



q\ 
9k 



= p-4q 



(2) 



Because £(p, q) is the expected payoff to player R, it follows that — 5(p, q) is the expected payoff to player C. 



EXAMPLE 1 Expected Payoff to Player < 



For the carnival game described earlier, we have 



3 5 
-2 4 
6 -5 



-2 - 

—3 
0 



= § = .1805. 



Thus, in the long run, player R can expect to receive an average of about 18 cents from player C in each 
play of the game. 



So far we have been discussing the situation in which each player has a predetermined strategy. We will now consider 
the more difficult situation in which both players can change their strategies independently. For example, in the game 
described in the introduction, we would allow both players to alter the areas of the sectors of their wheels and thereby 
control the probabilities of their respective moves. This qualitatively changes the nature of the problem and puts us 
firmly in the field of true game theory. It is understood that neither player knows what strategy the other will choose. It 
is also assumed that each player will make the best possible choice of strategy and that the other player knows this. 
Thus, player R attempts to choose a strategy p such that £(p, q) is as large as possible for the best strategy q that player 
C can choose; and similarly, player C attempts to choose a strategy q such that S(p, q) is as small as possible for the 
best strategy p that player R can choose. To see that such choices are actually possible, we will need the following 
theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The general proof, which involves ideas 
from the theory of linear programming, will be omitted. However, below we will prove this theorem for what are called 
strictly determined games and 2x2 matrix games.) 

n E 

THEOREM 10.7.1 Fundamental Theorem of Zero-Sum Games 

There exist strategies p and q such that 

£(p*,q)>£(p*.q*)>5(p,q*) (3) 

for all strategies p and q. 



The strategies p and q in this theorem are the best possible strategies for players R and C, respectively. To see why 
this is so, let V = £(p , q ) . The left-hand inequality of Equation 3 then reads 

fi'(p , q) > V for all strategies q 
This means that if player R chooses the strategy p , then no matter what strategy q player C chooses, the expected 

payoff to player R will never be below v. Moreover, it is not possible for player R to achieve an expected payoff greater 
than V. To see why, suppose there is some strategy p that player R can choose such that 

^(P » Q) > for all strategies q 



Then, in particular, 

E(p ,q)>v 

But this contradicts the right-hand inequahty of Equation 3, which requires that v > ^(p , q ) • Consequently, the best 

player R can do is prevent his or her expected payoff from falling below the value v. Similarly, the best player C can do 
is ensure that player Rs expected payoff does not exceed v, and this can be achieved by using strategy q . 

On the basis of this discussion, we arrive at the following definitions. 

r n 
DEFINITION 1 

If p and q are strategies such that 

£(p*.q)>£(p*,q*)>£(p,q*) (4) 

for all strategies p and q, then 

(i) p * is called an optimal strategy for player R. 

(ii) q is called an optimal strategy for player C. 
(ill) V = £'(p , q ) is called the value of the game. 

L J 

The wording in this definition suggests that optimal strategies are not necessarily unique. This is indeed the case, and in 
Exercise 2 we ask you to show this. However, it can be proved that any two sets of optimal strategies always result in 
the same value v of the game. That is, if p , q and p , q are optimal strategies, then 

E(p*.q*)=E(p**,q*) (5) 

The value of a game is thus the expected payoff to player R when both players choose any possible optimal strategies. 

To find optimal strategies, we must find vectors p and q that satisfy Equation 4. This is generally done by using linear 

programming techniques. Next, we discuss special cases for which optimal strategies may be found by more elementary 
techniques. 

We now introduce the following definition. 



DEFINITION 2 

An entry i^?'^ in a payoff matrix A is called a saddle point if 

(i) is the smallest entry in its row, and 

(ii) is the largest entry in its column. 

A game whose payoff matrix has a saddle point is called strictly determined. 

L 

For example, the shaded element in each of the following payoff matrices is a saddle point: 




If a matrix has a saddle point (^r^, it turns out that the following strategies are optimal strategies for the two players: 

0 



p =[0 0 ... 1 ... 0], 
rth entiy 



q = 



s\h entry 



That is, an optimal strategy for player R is to always make the rth move, and an optimal strategy for player C is to 
always make the ^-th move. Such strategies for which only one move is possible are called pure strategies. Strategies for 
which more than one move is possible are called mixed strategies. To show that the above pure strategies are optimal, 
you can verify the following three equations (see Exercise 6): 



S(p , q ) =p =a 



rs 



(6) 



'S'Cp* • q) = p*-4q > a^^ for any strategy q (7) 

-S(p . q*) = p-4q* < for any strategy P (8) 

Together, these three equations imply that 

£(p*.q)>£(p*, q*)>5(p, q*) 

for all strategies p and q. Because this is exactly Equation 4, it follows that p and q are optimal strategies. 

From Equation 6 the value of a strictly determined game is simply the numerical value of a saddle point i^r^- It is 
possible for a payoff matrix to have several saddle points, but then the uniqueness of the value of a game guarantees that 
the numerical values of all saddle points are the same. 

EXAMPLE 2 Optimal Strategies to Maximize a Viewing Audience M 

Two competing television networks, R and C, are scheduling one-hour programs in the same time period. 
Network R can schedule one of three possible programs, and network C can schedule one of four possible 
programs. Neither network knows which program the other will schedule. Both networks ask the same 
outside polling agency to give them an estimate of how all possible pairings of the programs will divide the 
viewing audience. The agency gives them each Table 2, whose (j, j)-th entry is the percentage of the 
viewing audience that will watch network R if network R's program / is paired against network Cs program 
j. What program should each network schedule in order to maximize its viewing audience? 



Table 2 







Network C'i 

Program 






1 


2 


3 


4 




1 


60 


20 


30 


55 


Network /?'s 
Program 


2 


50 


75 


45 


60 


3 


70 


45 


35 


30 



Solution Subtract 50 from each entry in Table 2 to construct the following matrix: 

"10 -30 -20 5" 
0 25 -5 10 
20 -5 -15 -20 

This is the payoff matrix of the two-person zero-sum game in which each network is considered to start 
with 50% of the audience, and the (z, J)-th entry of the matrix is the percentage of the viewing audience 
that network C loses to network R if programs / and j are paired against each other. It is easy to see that the 
entry 

taf23 = - 5 

is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to schedule program 2, 
and the optimal strategy of network C is to schedule program 3. This will result in network Rs receiving 
45% of the audience and network Cs receiving 55%) of the audience. 



2x2 Matrix Games 



Another case in which the optimal strategies can be found by elementary means occurs when each player has only two 
possible moves. In this case, the payoff matrix is a 2 x 2 matrix 

"^11 ^12" 
'^21 ^22 



A = 



If the game is strictly determined, at least one of the four entries of v4 is a saddle point, and the techniques discussed 
above can then be applied to determine optimal strategies for the two players. If the game is not strictly determined, we 
first compute the expected payoff for arbitrary strategies p and q: 



"an 


a\2 






«22_ 


<i2 



£(p,q)=p^q=[;?l P2\ 

= anp\qi ^ ^UP\^2-^ ^2\P2^l ^22P2^2 

Because 

Pi+P2=^ and ^l+'?2=1 

we may substitute P2= 1 — P\ ^nd q2 = 1 — 1 into 9 to obtain 



(9) 



(10) 



E(p, q) =aiipiqi +<ati2;'l(l +-321(1 - Pl)qi+a22i'^ -pOC^ -^l) 



(11) 



If we rearrange the terms in Equation 1 1 , we can write 



q) = [(<^n-^^22-^n-<^2i)p\ - (^22-^21)]^! + ('^12-^2 22)^^1 +^22 

By examining the coefficient of the ?! term in 12, we see that if we set 

P\=P\ = ; — 

then that coefficient is zero, and 12 reduces to 

a) = ^11^22-^12^21 

' ^11 +^22-^3(i2-< 



^21 



(12) 



(13) 



(14) 



Equation 14 is independent of q; that is, if player R chooses the strategy determined by 13, player C cannot change the 
expected payoff by varying his or her strategy. 



In a similar manner, it can be verified that if player C chooses the strategy determined by 



then substituting in 12 gives 



Equations 14 and 16 show that 



*i ■'l -211 + '3(22- 12 -'='21 



Eiv ,q)=5(p .q )=£(p,q ) 



(15) 



(16) 



(17) 



for all strategies p and q. Thus, the strategies determined by 13, 15, and 10 are optimal strategies for players R and C, 
respectively, and so we have the following result. 

THEOREM 10.7.2 Optimal Strategies for a 2 x 2 Matrix Game 

For a 2 X 2 game that is not strictly determined, optimal strategies for players R and C are 



and 



The value of the game is 



^22- 


-«21 




-ai2 


ail+<3!22- 


-an- 021 


aii+a22- 


-ai2 






-ai2 1 




♦ 


"211 +"322 


-ai2-<221 




q = 


a^^ 


-'^21 






<311 + ^22 


-^12-^21 













In order to be complete, we must show that the entries in the vectors p and q are numbers strictly between 0 and 1 . In 
Exercise 8 we ask you to show that this is the case as long as the game is not strictly determined. 



Equation 1 7 is interesting in that it implies that either player can force the expected payoff to be the value of the game 
by choosing his or her optimal strategy, regardless of which strategy the other player chooses. This is not true, in 
general, for games in which either player has more than two moves. 

EXAMPLES Using Theorem 10.7.2 M 



The federal government desires to inoculate its citizens against a certain flu vims. The vims has two 
strains, and the proportions in which the two strains occur in the vims population is not known. Two 
vaccines have been developed and each citizen is given only one of them. Vaccine 1 is 85% effective 
against strain 1 and 70% effective against strain 2. Vaccine 2 is 60%) effective against strain 1 and 90%) 
effective against strain 2. What inoculation policy should the govemment adopt? 

Solution We can consider this a two-person game in which player R (the government) desires to make 
the payoff (the fraction of citizens resistant to the vims) as large as possible, and player C (the vims) 
desires to make the payoff as small as possible. The payoff matrix is 

Strain 
1 2 

,r . 1 r.85 .70" 
Vaccine ^ 

2 [ 60 .90 

This matrix has no saddle points, so Theorem 10.7.2 is applicable. Consequently, 



^22-^321 ^ 90- 60 ^'30^2 

an \ a22-ai2-a2i .85 -h .90 - .70 - .60 .45 3 



He 

P2 



* _ ^22-^12 _ ■90-.70 _ .20 _ 4 



a\\^a22-aYl-a2\ .85 -f .90 - .70 - .60 .45 9 



J" _ 1 4_5 



V = 



ana^^^^ax^a^A _ (.85)(.90) ^ (.70)(.60) _ .345 _ ™ 
a\\ \ ^22-^12-^21 .85 } .90 -.70 -.60 .45 



2 1 
Thus, the optimal strategy for the government is to inoculate -j of the citizens with vaccine 1 and y of the 

citizens with vaccine 2. This will guarantee that about 16.1% of the citizens will be resistant to a vims 
attack regardless of the distribution of the two strains. 

In contrast, a vims distribution of 4 of strain 1 and 4 of strain 2 will result in the same 16.1% of resistant 

9 9 

citizens, regardless of the inoculation strategy adopted by the govemment (see Exercise 7). 



Exercise Set 10.7 

1. Suppose that a game has a payoff matrix 



-4 6-4 1 



(a) If players R and C use strategies 



and q = 



respectively, what is the expected payoff of the game? 

(b) If player C keeps his strategy fixed as in part (a), what strategy should player R choose to maximize his expected 

payoff? 

(c) If player R keeps her strategy fixed as in part (a), what strategy should player C choose to minimize the expected 
payoff to player R7 



Answer: 

(a) -5/8 

(b) [0 1 0] 

(c) [1 0 0 0]^ 

2. Construct a simple example to show that optimal strategies are not necessarily unique. For example, fmd a payoff 
matrix with several equal saddle points. 



Answer: 



Leti4: 



1 1 
1 1 



, for example. 



3. For the strictly determined games with the following payoff matrices, fmd optimal strategies for the two players, and 
find the values of the games. 

(a) 



[=3] 



(b) 



(c) 



(d) 



-3 -2 

2 4 
-4 1 



2 -2 0 

-6 0 -5 

5 2 3 

-3 2 -1 

-2 -1 5 

-4 1 0 

-3 4 6 



Answer: 



, v = 3 



(b) 
(c) 

(d) 



p*=[0 1 0], 9*= [J]. v = 2 



P =[0 0 1], q = 



. v = 2 



P =[0 1 0 0], q 



v= -2 



4. For the 2 X 2 games with the following payoff matrices, find optimal strategies for the two players, and find the 
values of the games, 
(a) r 6 3 
-1 4 



(b) 
(c) 



r 40 20] 
[-10 30j 

in 



(d) [3 5 
5 2 



(e) 



[-1 -l\ 



Answer: 



(a) 



'-[5 3'I „*_ 



(b) 



P = 



P*=[l 0] 



, v = 



, v = 



(d) 



P = 



[3 3} 



27 



70 



, v = 



19 



(e) 

„*_r3 101 „*_ 



13 

12 
13 



, v= - 



29 
13 



5. Player R has two playing cards: a black ace and a red four. Player C also has two cards: a black two and a red three. 
Each player secretly selects one of his or her cards. If both selected cards are the same color, player Cpays player R 
the sum of the face values in dollars. If the cards are different colors, player R pays player C the sum of the face 
values. What are optimal strategies for both players, and what is the value of the game? 



Answer: 



[20 20 J' 



q = 



li 

20 
9_ 
20 



20 



6. Verify Equations 6, 7, and 8. 

7. Verify the statement in the last paragraph of Example 3. 

8. Show that the entries of the optimal strategies p and q given in Theorem 10.7.2 are numbers strictly between zero 
and one. 

Section 10.7 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific 
calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for 
the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your 
technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology 
utility to solve many of the problems in the regular exercise sets. 

Tl. Consider a game between two players where each player can make up to n different moves {n > 1 ) . If the ith move 
of player R and the Jth move of player C are such that i I j is even, then C pays 7? $ 1 . If j } J is odd, then R pays C$1. 
Assume that both players have the same strategy — that is, p„ = [pj] ^ and q„ = [Pi ] ^xl' where 
+ P2 + W + - - • + /'m = ^ • Use a computer to show that 

5(P2, q2) =(/>l-P2)^ 

s (P4, q4) = (p\ -Pi-^ps- pa) ^ 

S (P5, q^) = iP\ - P2 + P3 - P4 + P5)^ 

Using these results as a guide, prove in general that the expected payoff to player R is 

£(p«.q«) = [^E(-iy'+Sj ^0 

which shows that in the long run, player R will not lose in this game. 

T2. Consider a game between two players where each player can make up to n different moves (n > 1 ) . If both players 
make the same move, then player C pays player R $(« — !). However, if both players make different moves, then 
player R pays player C$1. Assume that both players have the same strategy — ^that is, p„ = [pi]i xn q« — [Pi ] wxl ' 
where + ^2 + P3 - • • ♦ Pn= ^ - Use a computer to show that 



5CP2. q2) = l(pi -Plf + ^iPl -fi2? + |c« -Pi)^ 
5(P3.q3)= •|(Pi-/»i)^ + ^0»i-«)^ + jO»i-«)^ 

+|(p2-Pl)^ + ^(p2-P2)^ + |(P2-«)^ 
+^(P3 -Pl)^ + 10»3 -«)^ + -«)^ 

+|(« - P3)^ + ^(P2 - W)^ + jC« - Pl)^ 
+^(P3 -P2)^ + + ^(P3 -P4)^ 

+|(P4-Pi)^ + |(P4-P2)^ + |CP4-«)^ 
+i(p4-P4)^ 

Using these results as a guide, prove in general that the expected payoff to player R is 
which shows that in the long run, player R will not lose in this game. 
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10.8 Leontief Economic Models 



In this section we discuss two linear models for economic systems. Some results about nonnegative matrices are applied to determine 
equilibrium price structures and outputs necessary to satisfy demand. 

Prerequisites 

Linear Systems 
Matrices 



Economic Systems 

Matrix theory has been very successful in describing the interrelations among prices, outputs, and demands in economic systems. In 
this section we discuss some simple models based on the ideas of Nobel laureate Wassily Leontief. We examine two different but 
related models: the closed or input-output model, and the open or production model. In each, we are given certain economic 
parameters that describe the interrelations between the "industries" in the economy under consideration. Using matrix theory, we then 
evaluate certain other parameters, such as prices or output levels, in order to satisfy a desired economic objective. We begin with the 
closed model. 

Leontief Closed (Input-Output) Model 

First we present a simple example; then we proceed to the general theory of the model. 

EXAMPLE 1 An Input-Output Model ^ 

Three homeowners — a carpenter, an electrician, and a plumber — agree to make repairs in their three homes. They agree 
to work a total of 10 days each according to the following schedule: 





Work Performed by 


C arpenter 


IJectrician 


Plumber 


Days of Work in Home of Carpenter 




1 


6 


Days of Work in Home of Kkctrician 


4 


5 


1 


Days of Work in Home of Plumber 


4 


4 


3 



For tax purposes, they must report and pay each other a reasonable daily wage, even for the work each does on his or her 
own home. Their normal daily wages are about $100, but they agree to adjust their respective daily wages so that each 
homeowner will come out even — that is, so that the total amount paid out by each is the same as the total amount each 
receives. We can set 

p\ = daily wage of carpenter 

P2 = daily wage of electrician 

P2 = daily wage of plumber 
To satisfy the "equilibrium" condition that each homeowner comes out even, we require that 

total expenditures = total income 
for each of the homeowners for the 10-day period. For example, the carpenter pays a total of 2/?i I P2 \ 6/? 3 for the 



repairs in his own home and receives a total income of\Op\ for the repairs that he performs on all three homes. 
Equating these two expressions then gives the first of the following three equations: 

2pi P2 i ^P3 = 10/? 1 

4P\ + ^P2 + = 10;?2 

The remaining two equations are the equilibrium equations for the electrician and the plumber. Dividing these equations 
by 1 0 and rewriting them in matrix form yields 



.2 


.1 


.6" 


>f 




>f 




.4 


.5 


.1 


P2 




P2 


(1) 


.4 


.4 


.3 


P3 




P3 





Equation 1 can be rewritten as a homogeneous system by subtracting the left side from the right side to obtain 



.8 


-.1 


-.6" 


~Pl' 




"O" 


-.4 


.5 


-.1 


P2 




0 


-.4 


-.4 


.7 


P3 




_0_ 



The solution of this homogeneous system is found to be (verify) 



'p\' 




"31" 


P2 


= s 


32 


P3 




36 



where s is an arbitrary constant. This constant is a scale factor, which the homeowners may choose for their 
convenience. For example, they can set 5 = 3 so that the corresponding daily wages — $93, $96, and $108 — are about 
$100. 



This example illustrates the salient features of the Leontief input-output model of a closed economy. In the basic Equation 1 , each 
column sum of the coefficient matrix is 1, corresponding to the fact that each of the homeowners' "outpuf of labor is completely 
distributed among these same homeowners in the proportions given by the entries in the column. Our problem is to determine suitable 
"prices" for these outputs so as to put the system in equilibrium — that is, so that each homeowner's total expenditures equal his or her 
total income. 



In the general model we have an economic system consisting of a finite number of "industries," which we number as industries 
1, 2, Jt. Over some fixed period of time, each industry produces an "outpuf of some good or service that is completely utilized in a 
predetermined manner by the k industries. An important problem is to find suitable "prices" to be charged for these k outputs so that 
for each industry, total expenditures equal total income. Such a price structure represents an equilibrium position for the economy. 

For the fixed time period in question, let us set 

p^ = price charged by the Jth industry for its total output 

&i^= fraction of the total output of the Jth industry purchased by the ith industry 

for i, ^ = 1, 2, k. By definition, we have 

® Pi>0, i = l,2 k 

(ii) ey>0, 1,^=1,2,...,* 

(iii) eyH-e2j-l-... + efcj = l, ^ = 1,2 k 

With these quantities, we form the price vector 

'P\ 



P = 



P2 
Pk 



and the exchange matrix or input-output matrix 



B = 



€21 e22 --■ ^2k 



^k2 ■■■ ^kk 

Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1 . 

As in the example, in order that the expenditures of each industry be equal to its income, the following matrix equation must be 
satisfied [see 1]: 

5p = p 



(2) 



(/-5)p = 0 



(3) 



Equation 3 is a homogeneous linear system for the price vector p. It will have a nontrivial solution if and only if the determinant of its 
coefficient matrix / — ^ is zero. In Exercise 7 we ask you to show that this is the case for any exchange matrix E. Thus, 3 always has 
nontrivial solutions for the price vector p. 

Actually, for our economic model to make sense, we need more than just the fact that 3 has nontrivial solutions for p. We also need the 
prices Pi of the k outputs to be nonnegative numbers. We express this condition as p > 0. (In general, if A is any vector or matrix, the 
notation ^4 > 0 means that every entry of A is nonnegative, and the notation J{ > Q means that every entry of A is positive. Similarly, 
A> B means A — B > 0, and A > B means A — B > 0-) To show that 3 has a nontrivial solution for which p > 0 is a bit more difficult 
than showing merely that some nontrivial solution exists. But it is true, and we state this fact without proof in the following theorem. 



THEOREM 10.8.1 

If E is an exchange matrix, then fi'p = p always has a nontrivial solution p whose entries are nonnegative. 



Let us consider a few simple examples of this theorem. 

EXAMPLE 2 Using Theorem 10.8.1 

Let 



Then (/ - £)p = 0 is 



B = 



I > 



0 














"0" 


0 


P2 




_0_ 



which has the general solution 



where s is an arbitrary constant. We then have nontrivial solutions p > 0 for any s > 0- 



EXAMPLES Using Theorem 10.8.1 < 



B = 



1 0 
0 1 







0" 




_1_ 


_0_ 



Let 



Then (/ — £)p = 0 has the general solution 



where s and t are independent arbitrary constants. Nontrivial solutions p > 0 then result from any ^ > 0 and ^ > 0, not 
both zero. 



Example 2 indicates that in some situations one of the prices must be zero in order to satisfy the equilibrium condition. Example 3 
indicates that there may be several linearly independent price structures available. Neither of these situations describes a truly 
interdependent economic structure. The following theorem gives sufficient conditions for both cases to be excluded. 



THEOREM 10.8.2 

Let E be an exchange matrix such that for some positive integer m all the entries of 5"^ are positive. Then there is exactly one 
linearly independent solution of (/ — £')p = 0, and it may be chosen so that all its entries are positive. 



We will not give a proof of this theorem. If you have read Section 10.5 on Markov chains, observe that this theorem is essentially the 
same as Theorem 10.5.4. What we are calling exchange matrices in this section were called stochastic or Markov matrices in Section 
10.5. 

EXAMPLE 4 Using Theorem 10.8.2 < 



The exchange matrix in Example 1 was 

.2 .1 .6 
.4 .5 .1 
__.4 .4 .3 

Because fi* > 0^ the condition > 0 in Theorem 10.8.2 is satisfied for m = I - Consequently, we are guaranteed that 
there is exactly one linearly independent solution of ( / — E)'p = 0, and it can be chosen so that p ; • 0. In that example, 
we found that 

"31" 



P = 



32 
36 



is such a solution. 



Leontief Open (Production) Model 

In contrast with the closed model, in which the outputs of k industries are distributed only among themselves, the open model attempts 
to satisfy an outside demand for the outputs. Portions of these outputs can still be distributed among the industries themselves, to keep 
them operating, but there is to be some excess, some net production, with which to satisfy the outside demand. In the closed model the 
outputs of the industries are fixed, and our objective is to determine prices for these outputs so that the equilibrium condition, that 
expenditures equal incomes, is satisfied. In the open model it is the prices that are fixed, and our objective is to determine levels of the 
outputs of the industries needed to satisfy the outside demand. We will measure the levels of the outputs in terms of their economic 
values using the fixed prices. To be precise, over some fixed period of time, let 



Tfj- = monetary value of the total output of the ith industry 

di = monetary value of the output of the zth industry needed to satisfy the outside demand 

= monetary value of the output of the Jth industry needed by the jth industry to produce one unit of monetary value of its own output 

With these quantities, we define the production vector 

^^1 



x = 



the demand vector 



d = 



^2 



and the consumption matrix 



C = 



^21 <^22 
<^k\ ^k2 



^2k 
^kk 



By their nature, we have that 

x>0, d>0, and C>0 
From the definition of ^ j; and it can be seen that the quantity 

is the value of the output of the zth industry needed by all k industries to produce a total output specified by the production vector x. 
Because this quantity is simply the /th entry of the column vector Cx^ we can say further that the ith entry of the column vector 

x-Cx 

is the value of the excess output of the /th industry available to satisfy the outside demand. The value of the outside demand for the 
output of the /th industry is the /th entry of the demand vector d. Consequently, we are led to the following equation 



x-Cx=d 



or 



(/-C)x = d 



(4) 



for the demand to be exactly met, without any surpluses or shortages. Thus, given C and d, our objective is to find a production vector 
x > 0 that satisfies Equation 4. 

EXAMPLES Production Vector for a Town M 



A town has three main industries: a coal-mining operation, an electric power-generating plant, and a local railroad. To 
mine $1 of coal, the mining operation must purchase $.25 of electricity to run its equipment and $.25 of transportation 
for its shipping needs. To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of its own 
electricity to run auxiliary equipment, and $.05 of transportation. To provide $1 of transportation, the railroad requires 
$.55 of coal for fuel and $.10 of electricity for its auxiliary equipment. In a certain week the coal-mining operation 
receives orders for $50,000 of coal from outside the town, and the generating plant receives orders for $25,000 of 
electricity from outside. There is no outside demand for the local railroad. How much must each of the three industries 
produce in that week to exactly satisfy their own demand and the outside demand? 

Solution For the one-week period let 

^1 = value of total output of coal-mining operation 
X2 = value of total output of power-generating plant 
^3 = value of total output of local railroad 

From the information supplied, the consumption matrix of the system is 



c= 



0 


.65 


.55 


25 


.05 


.10 


25 


.05 


0 



The linear system (/ — C)x = d is then 



1.00 


-.65 


-.55" 






'50, 000" 


-.25 


.95 


-.10 






25, 000 


-.25 


-.05 


1.00 


^3 




0 



The coefficient matrix on the left is invertible, and the solution is given by 



x=(/-C)-'d=^ 



756 


542 


470" 


"50, OOO" 




'102, 087" 


220 


690 


190 


25, 000 




56, 163 


200 


170 


630 


0 




28, 330 



503 



Thus, the total output of the coal-mining operation should be $102,087, the total output of the power-generating plant 
should be $56,163, and the total output of the railroad should be $28,330. 



Let us reconsider Equation 4: 

If the square matrix / _ is invertible, we can write 



(I-C)x=d 



x=(/-C)"^d 



(5) 



In addition, if the matrix (I — C) ^ has only nonnegative entries, then we are guaranteed that for any d > 0, Equation 5 has a unique 

nonnegative solution for x. This is a particularly desirable situation, as it means that any outside demand can be met. The terminology 
used to describe this case is given in the following definition. 

r n 
DEFINITION 1 

A consumption matrix C is said to be productive if (/ — C) ~^ exists and 

(/_c) -^>o 



We will now consider some simple criteria that guarantee that a consumption matrix is productive. The first is given in the following 
theorem. 



THEOREM 10.8.3 Productive Consumption Matrix 

A consumption matrix C is productive if and only if there is some production vector x > 0 such that x > Cx- 



(The proof is outlined in Exercise 9.) The condition x > Cx means that there is some production schedule possible such that each 
industry produces more than it consumes. 



Theorem 10.8.3 has two interesting corollaries. Suppose that all the row sums of C are less than 1. If 



X = 



1 



then Cx is a column vector whose entries are these row sums. Therefore, x > Cx? and the condition of Theorem 10.8.3 is satisfied. 
Thus, we arrive at the following corollary: 



COROLLARY 10.8.4 



A consumption matrix is productive if each of its row sums is less than 1 . 



As we ask you to show in Exercise 8, this corollary leads to the following: 



COROLLARY 10.8.5 

A consumption matrix is productive if each of its column sums is less than 1 . 



Recalling the definition of the entries of the consumption matrix C, we see that the yth column sum of C is the total value of the outputs 
of all k industries needed to produce one unit of value of output of the yth industry. The yth industry is thus said to be profitable if that 
yth column sum is less than 1. In other words. Corollary 10.8.5 says that a consumption matrix is productive if all k industries in the 
economic system are profitable. 

EXAMPLES Using Corollary 10.8.5 M 



The consumption matrix in Example 5 was 



C = 



0 .65 .55 
.25 .05 .10 

.25 .05 0 



All three column sums in this matrix are less than 1, so all three industries are profitable. Consequently, by Corollary 
10.8.5, the consumption matrix C is productive. This can also be seen in the calculations in Example 5, as (/ — C) ~^ 
nonnegative. 



Exercise Set 10.8 

1. For the following exchange matrices, find nonnegative price vectors that satisfy the equilibrium condition 3. 



(b) 



1 


1 






2 


3 






1 


2 






2 


3 






1 


0 


1 ' 




2 




2 




1 


0 


1 




3 


2 




1 

6 


1 


0 




.35 


.50 


.30 


.25 


.20 


.30 


.40 


.30 


.40 



Answer: 



(a) 
(b) 



(c) 



[78 
54 
79 



2. Using Theorem 10.8.3 and its corollaries, show that each of the following consumption matrices is productive. 

r 



(a) 
(b) 

(c) 



3 .6 

.70 .30 .25 
.20 .40 .25 

.05 .15 .25 

.7 .3 .2 
.1 .4 .3 
.2 .4 .1 



Answer: 



(a) Use Corollary 10.8.4; all row sums are less than one. 

(b) Use Corollary 10.8.5; all column sums are less than one. 



(c) 


'2" 




'1.9" 


Use Theorem 10.8.3, with x = 


1 


>Cx = 


.9 




1 




.9 



3. Using Theorem 10.8.2, show that there is only one linearly independent price vector for the closed economic system with exchange 
matrix 

fO .2 .5" 
1 .2 .5 
0 .6 0 



B = 



Answer: 

has all positive entries. 

4. Three neighbors have backyard vegetable gardens. Neighbor^ grows tomatoes, neighbor B grows com, and neighbor C grows 

lettuce. They agree to divide their crops among themselves as follows: A gets of the tomatoes, -i of the com, and -7 of the 

2 3 4 

lettuce. B gets 4* of the tomatoes, 4- of the com, and 4- of the lettuce. C gets 4- of the tomatoes, 4- of the com, 4- of the lettuce. 
^3 3 4 ^6 3 2 

What prices should the neighbors assign to their respective crops if the equilibrium condition of a closed economy is to be satisfied, 

and if the lowest-priced crop is to have a price of $100? 
Answer: 

Price of tomatoes, $120.00; price of com, $100.00; price of lettuce, $106.67 

5. Three engineers — a civil engineer (CE), an electrical engineer (EE), and a mechanical engineer (ME) — each have a consulting firm. 
The consulting they do is of a multidisciplinary nature, so they buy a portion of each others' services. For each $1 of consulting the 
CE does, she buys $.10 of the EE's services and $.30 of the ME's services. For each $1 of consulting the EE does, she buys $.20 of 
the CE's services and $.40 of the ME's services. And for each $1 of consulting the ME does, she buys $.30 of the CE's services and 
$.40 of the EE's services. In a certain week the CE receives outside consulting orders of $500, the EE receives outside consulting 
orders of $700, and the ME receives outside consulting orders of $600. What dollar amount of consulting does each engineer 
perform in that week? 



Answer: 



$1256 for the CE, $1448 for the EE, $1556 for the ME 

^' (a) Suppose that the demand for the output of the zth industry increases by one unit. Explain why the /th column of the matrix 
(/ _ C) is the increase that must be made to the production vector x to satisfy this additional demand. 

(b) Referring to Example 5, use the result in part (a) to determine the increase in the value of the output of the coal-mining 
operation needed to satisfy a demand of one additional unit in the value of the output of the power-generating plant. 

Answer: 



(b) 



542 
503 



7. Using the fact that the column sums of an exchange matrix E are all 1 , show that the column sums [ — E are zero. From this, 
show that I — E has zero determinant, and so (/ — 5")? = 0 has nontrivial solutions for p. 

8. Show that Corollary 10.8.5 follows from Corollary 10.8.4. 

T -1 1 ^ 

[Hint: Use the fact that (A ) = (^4 ~ ) for any invertible matrix A.'\ 

9. (Calculus required) Prove Theorem 10.8.3 as follows: 

(a) Prove the "only if part of the theorem; that is, show that if C is a productive consumption matrix, then there is a vector x > 0 
such that X > Cx- 

(b) Prove the "if part of the theorem as follows: 

Step 1 Show that if there is a vector x* > 0 such that Cx* - . x*? then x* 0- 
Step 2 Show that there is a number X such that 0 < A < 1 Cx* ,\x*- 
Step 3 Show that c"x* < A"x* for ^ = 1, 2, .... 
Step 4 Show that C" - 0 as xj- 
Step 5 By multiplying out, show that 

(/_C)(/ + C + C2 4=-+C""^)=/-C" 

for «= 1, 2, .... 

Step 6 By letting « — ► oo in Step 5, show that the matrix infinite sum 

sr=/ + c + c^+... 



exists and that (/ - C)S=I. 
Step 7 Show that S > 0 and that S=(I-C)~y 
Step 8 Show that C is a productive consumption matrix. 



Section 10.8 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of 
these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Consider a sequence of exchange matrices {E2, ^3, E4, E^, .... Eyi) , where 



£2 = 



1 i 



0 ^ ^ 



1 0 



0 ^ ^ 



£4 = 



i 
3 

lo 4 



0 4 4 



-044 



4 0 4 



i 

4 



0 4 4 



and so on. Use a computer to show that > > O3, 5^ : - O4, B\ > O'^, and make the conjecture that although > 0„ is true, 

EV, > Ov, is not true for = 1, 2, 3, » — 1- Next, use a computer to determine the vectors p« such that By^Pn — Vn 

(for 

5, 6), and then see if you can discover a pattern that would allow you to compute Pw-f-1 easily from Pm. Test your discovery by first 
constructing pg from 

^2520 
3360 
1890 
P7= 672 
175 
36 
7 

and then checking to see whether Sgpg = pg. 

T2. Consider an open production model having n industries with ^ ; | . In order to produce $1 of its own output, the yth industry must 
spend $(1 / «) for the output of the ith industry (for all i ^ J), but the jth industry (for all y = 1, 2, 3, n) spends nothing for its own 
output. Construct the consumption matrix C^, show that it is productive, and determine an expression for (/^ Q^^ In 
determining an expression for (/^ . (7^) use a computer to study the cases when » = 2? 3, 4, and 5; then make a conjecture and 
prove your conjecture to be true. [Hint: If F„ = [ 1 ] (i.e., the « x « matrix with every entry equal to 1), first show that 

and then express your value of (/^ — Q^) ~^ in terms of n, /„, and 
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10.9 Forest Management 



In this section we discuss a matrix model for the management of a forest where trees are grouped into classes according to height. 
The optimal sustainable yield of a periodic harvest is calculated when the trees of different height classes can have different 
economic values. 



Prerequisites 

Matrix Operations 



Our objective is to introduce a simplified model for the sustainable harvesting of a forest whose trees are classified by height. The 
height of a tree is assumed to determine its economic value when it is cut down and sold. Initially, there is a distribution of trees 
of various heights. The forest is then allowed to grow for a certain period of time, after which some of the trees of various heights 
are harvested. The trees left unharvested are to be of the same height configuration as the original forest, so that the harvest is 
sustainable. As we will see, there are many such sustainable harvesting procedures. We want to find one for which the total 
economic value of all the trees removed is as large as possible. This determines the optimal sustainable yield of the forest and is 
the largest yield that can be attained continually without depleting the forest. 



Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas trees year after year. Every December 
the harvester cuts down some of the trees to be sold. For each tree cut down, a seedling is planted in its place. In this way the total 
number of trees in the forest is always the same. (In this simplified model, we will not take into account trees that die between 
harvests. We assume that every seedling planted survives and grows until it is harvested.) 

In the marketplace, trees of different heights have different economic values. Suppose that there are n different price classes 
corresponding to certain height intervals, as shown in Table 1 and Figure 10. 9.1. The first class consists of seedlings with heights 
in the interval [0, /^i), and these seedlings are of no economic value. The ^th class consists of trees with heights greater than or 
equal iohy^-i- 



Optimal Sustainable Yield 



The Model 




Value of Tree 



Figure 10.9.1 



Table 1 



K lass 


\ ttluc (dollars) 


tii'ij^ni inicrv;ii 


1 (seedlings) 


None 


|0. //, 1 


2 




[/?,,/?^) 

11 2' 




Py 


\h h ^ 








n-l 


p„-\ 




n 


p„ 





Let (i = 1, 2, ...,«) be the number of trees within the /th class that remain after each harvest. We form a column vector with 
the numbers and call it the nonharvest vector: 



X = 



^1 
^2 



For a sustainable harvesting policy, the forest is to be returned after each harvest to the fixed configuration given by the 
nonharvest vector x. Part of our problem is to find those nonharvest vectors x for which sustainable harvesting is possible. 

Because the total number of trees in the forest is fixed, we can set 



(1) 



where s is predetermined by the amount of land available and the amount of space each tree requires. Referring to Figure 10.9.2, 
we have the following situation. The forest configuration is given by the vector x after each harvest. Between harvests the trees 
grow and produce a new forest configuration before each harvest. A certain number of trees are removed from each class at the 
harvest. Finally, a seedling is planted in place of each tree removed, to return the forest again to the configuration x. 



Trees 
removed 




iiiiy n i I 



I'orcM a tier growth 



Jtxxs riDl rcmovcil 



Same 
forest 
configuration , 



Forest before growth 
(nonliarvest vector x) 



Forest after harvest 
(nonharvest vector x) 



00 

.S 



J 



Figure 10.9.2 



Consider first the growth of the forest between harvests. During this period a tree in the ith class may grow and move up to a 



higher height class. Or its growth may be retarded for some reason, and it will remain in the same class. We consequently define 

the following growth parameters gi for i = 1, 2 « — 1 : 

gi = the fraction of trees in the iAi class that gt ow into the(j H- l)-st class during a growth pehod 

For simplicity we assume that a tree can move at most one height class upward in one growth period. With this assumption, we 
have 

1 — gy = the fraction of trees in the ith class that remain in the ith class during a growth period 
With these « — 1 growth parameters, we form the following « x « growth matrix: 

1-gl 0 0 ... 0 

gl 1 — g2 0 ... 0 

0 g2 l-g3 ■ ■ ■ 0 

: : : : : 

0 0 0 • • • 1 -g„_i 0 
0 0 0 . . . g„_i 1 

Because the entries of the vector x are the numbers of trees in the n classes before the growth period, you can verify that the 
entries of the vector 

(i-gi)^i 

glJri + (l-g2)jr2 

: 

gM-2^«-2 + (1 -g„_i)7:„_i 

are the numbers of trees in the n classes after the growth period. 



(2) 



Gx = 



(3) 



Suppose that during the harvest we remove y,- (i = 1, 2 «) trees from the ith class. We will call the column vector 

>1 



y= 



72 



the harvest vector. Thus, a total of 

yi+72H- ■ ■ • -^yn 

trees are removed at each harvest. This is also the total number of trees added to the first class (the new seedlings) after each 
harvest. If we define the following « x « replacement matrix 



R = 



1 


1 ■ 


■ 1 


0 


0 • 


• 0 


: 




: 


0 


0 • 


• • 0 



(4) 



then the column vector 



(5) 



Specifies the configuration of trees planted after each harvest. 



At this point we are ready to write the following equation, which characterizes a sustainable harvesting policy: 



configuration 

at end of 
growth period 



— [harvest] ■ 



new seedling 
replacement 



configuration 
at beginning of 
growth period 



or mathematically, 

This equation can be rewritten as 

or more comprehensively as 



Gx-y I Ry = x 
(/-^)y=(G-/)x 



(6) 



0 
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-1 • • 


• -1 


-l' 
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0 
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1 
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0 


yi 




gi 


-g2 


0 • • 


0 


0 


^2 


0 


0 


1 • • 


0 


0 


yz 




0 


g2 


-g3 • • 


0 


0 


^3 


0 


0 


0 • • 


1 


0 


yn-\ 




0 


0 


0 • • 


■ -gn-1 


0 




0 


0 


0 • • 


0 


1 


yn 




0 


0 


0 • • 


• Sn-\ 


0 





We will refer to Equation 6 as the sustainable harvesting condition. Any vectors x and y with nonnegative entries, and such that 
x\-\rX2^ ' ' ' = s", which satisfy this matrix equation, determine a sustainable harvesting policy for the forest. Note that 
if y 1 > 0, then the harvester is removing seedlings of no economic value and replacing them with new seedlings. Because there is 
no point in doing this, we assume that 



yi = 0 



With this assumption, it can be verified that 6 is the matrix form of the following set of equations: 

72 = gl^l-g2^2 

73 = g2^2-g3^3 

Note that the first equation in 8 is the sum of the remaining ,>2 _ ] equations. 
Because we must have y > Q for j = 2, 3, «, Equations 8 require that 



(7) 



(8) 



gl^l>g2^2> • • ■ >gn-\X„-\>0 



(9) 



Conversely, if x is a column vector with nonnegative entries that satisfy Equation 9, then 7 and 8 define a column vector y with 
nonnegative entries. Furthermore, x and y then satisfy the sustainable harvesting condition 6. In other words, a necessary and 
sufficient condition for a nonnegative column vector x to determine a forest configuration that is capable of sustainable 
harvesting is that its entries satisfy 9. 



Optimal Sustainable Yield 

Because we remove y j trees from the zth class (j = 2, 3, . . and each tree in the zth class has an economic value of ^3, the 
total yield of the harvest, Yld, is given by 



Yld = P2y2 + P2y3 + - + Pnyn (10) 



Using 8, we may substitute for the yj 's in 10 to obtain 



(11) 



Combining 11,1, and 9, we can now state the problem of maximizing the yield of the forest over all possible sustainable 
harvesting policies as follows: 

r n 
Problem 

Find nonnegative numbers x\, X2, that maximize 

Yld = p2g\xi + (P3-P2)g2^2^-'^ (Pn-Pn-\)gn-\^>7-\ 

subject to 

^1 H-:^:2 + --- + ^M = ^ 

and 

gl^l > g2^2 > > gn-\^n-\ > 0 



As formulated above, this problem belongs to the field of linear programming. However, we will illustrate the following result, 
without linear programming theory, by actually exhibiting a sustainable harvesting policy. 



THEOREM 10.9.1 Optimal Sustainable Yield 

The optimal sustainable yield is achieved by harvesting all the trees from one particular height class and none of the trees 
from any other height class. 



Let us first set 

Yld]^ = yield obtained by harvesting all of the ith class and none of the other classes 

The largest value of Yld}^ for = 2, 3, . . « will then be the optimal sustainable yield, and the corresponding value of k will be 
the class that should be completely harvested to attain the optimal sustainable yield. Because no class but the Ath is harvested, we 
have 

y2 =73 = -=7^c-i =yk-\-\ =-=yn = o (12) 

In addition, because all of the kth class is harvested, no trees are ever present in the height classes above the kth class. Thus, 

Xk = Xk+i=... = Xn = 0 (13) 

Substituting 12 and 13 into the sustainable harvesting condition 8 gives 

yk = gl^l 
0 = gl^l-g2^2 

0 = g2^2-g3^3 ^^4^ 

0 = gk-2^k-2 - 
yk = gk-\^k-i 

Equations 14 can also be written as 



yk = gl^l = 82^2 = - = Sk-lXk-l 



(15) 



from which it follows that 



^2 = g\^\fg2 
^k-\ = g\^\fgk-{ 



(16) 



If we substitute Equations 13 and 16 into 



[which is Equation 1], we can solve for x i and obtain 

^1 = 



g2 g3 ' ' gk-\ 
For the yield Yldj^, we combine 10, 12, 15, and 17 to obtain 

y^^k = Piy2 + P2y3 + - + Pyiyn 

= Pkyk 
= pkg\^\ 

^ EM. 

g\ g2 gk-\ 

Equation 18 determines Yld^r, in terms of the known growth and economic parameters for any k = 2, 3, 
sustainable yield is found as follows. 



(17) 



(18) 



. «. Thus, the optimal 



THEOREM 10.9.2 Finding the Optimal Sustainable Yield 

The optimal sustainable yield is the largest value of 

EM. 



J_ + J_ + ... + _L 

g\ g2 gk-\ 



for t = 2, 3, «. The corresponding value of k is the number of the class that is completely harvested. 



In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable yield is 



X = 



gl g2 



gk-l 



0 
0 

0 



(19) 



Theorem 10.9.2 implies that it is not necessarily the highest-priced class of trees that should be totally cropped. The growth 
parameters gi must also be taken into account to determine the optimal sustainable yield. 



EXAMPLE 1 Using Theorem 10.9.2 < 



For a Scots pine forest in Scotland with a growth period of six years, the following growth matrix was found (see 
M. B. Usher, "A Matrix Approach to the Management of Renewable Resources, with Special Reference to Selection 



1966, pp. 355-367): 






.72 0 0 0 


0 


0 


.28 .69 0 0 


0 


0 


0 .31 .75 0 


0 


0 


0 0 .25 .77 


0 


0 


0 0 0 .23 


.63 


0 


0 0 0 0 


.37 


1.00 



G = 



Suppose that the prices of trees in the five tallest height classes are 

P2 = %50, p2 = $\00, p4 = %\50, p^ = $200, p^ = $250 

Which class should be completely harvested to obtain the optimal sustainable yield, and what is that yield? 



Solution 


From matrix G we have that 










g\ = 


28, 


g2 = .31. 




g3 = 


25, 


g4 = .23. 


Equation 1 


8 then gives 














yid2 


= 50s/ (.28-1) 




14.0s 








Yld-i 


= 100s/ ( .28-1 


+ 


. 31-1) = 


: 14.7s 






yid4 


= 150s/ ( .28-1 


+ 


.31-1 + 


. 25-1) ^ 


13.9s 






= 200s/ ( .28-1 


+ 


.31-1 + 


. 25-1 _^ 


23-1) ^ ^2.2s 




yid6 


= 250s/ ( .28-1 


1 


.31-1 , 


. 25-1 , 


23-1 , 3^-1^ 



g5 = .37 



14.0s 

We see that Yldi, is the largest of these five quantities, so from Theorem 10.9.2 the third class should be completely 
harvested every six years to maximize the sustainable yield. The corresponding optimal sustainable yield is $14.75, 
where s is the total number of trees in the forest. 



Exercise Set 10.9 

1. A certain forest is divided into three height classes and has a growth matrix between harvests given by 

0 0 

f \ 

If the price of trees in the second class is $30 and the price of trees in the third class is $50, which class should be completely 
harvested to attain the optimal sustainable yield? What is the optimal yield if there are 1000 trees in the forest? 

Answer: 

The second class; $15,000 

2. In Example 1 , to what level must the price of trees in the fifth class rise so that the fifth class is the one to harvest completely 
in order to attain the optimal sustainable yield? 

Answer: 

$223 

3. In Example 1, what must the ratio of the prices P2 'P3 'P4'P5 ' P6 be in order that the yields Yldj,^, ^ = 2, 3, 4, 5, 6, all be the 




same? (In this case, any sustainable harvesting poHcy will produce the same optimal sustainable yield. 
Answer: 

1:1.90:3.02:4.24:5.00 

4. Derive Equation 19 for the nonharvest vector x corresponding to the optimal sustainable harvesting policy described in 
Theorem 10.9.2. 

5. For the optimal sustainable harvesting policy described in Theorem 10.9.2, how many trees are removed from the forest 
during each harvest? 

Answer: 

^/(gr^+g2"^+ • • • -^Sk-O 

6. If all the growth parameters gi , g2, - g^-l in the growth matrix G are equal, what should the ratio of the prices 
P2'P3'---'Pn be in order that any sustainable harvesting policy be an optimal sustainable harvesting policy? (See Exercise 3.) 

Answer: 

1:2:3: •••:«-! 

Section 10.9 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 

Tl. A particular forest has growth parameters given by 

-I 

for j = 1, 2, 3, « — 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the kth height interval is given by 

where a is a constant (in dollars) and p is a parameter satisfying 1 <p<2. 

(a) Show that the yield Yldjr^ is given by 

(b) For 

p=1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 

use a computer to determine the class number that should be completely harvested, and determine the optimal sustainable 
yield in each case. Make sure that you allow k to take on only integer values in your calculations. 

(c) Repeat the calculations in part (b) using 

p= 1.91, 1.92, 1.93, 1.94, 1.95, 
1.96, 1.97, 1.98, 1.99 

(d) Show that ifp=2, then the optimal sustainable yield can never be larger than las. 

(e) Compare the values of k determined in parts (b) and (c) to 1 / (2 — p), and use some calculus to explain why 




T2. A particular forest has growth parameters given by 



" 2' 

for j = 1, 2, 3, ..^ » — 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the kth height interval is given by 

where a is a constant (in dollars) and p is a parameter satisfying 1 < p. 

(a) Show that the yield Yid}^ is given by 

2*-2 

(b) For 

p= 1,2, 3.4, 5, 6.7. 8.9. 10 

use a computer to determine the class number that should be completely harvested in order to obtain an optimal yield, and 
determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your 
calculations. 

(c) Compare the values of k determined in part (b) to 1 4- p / ln(2) and use some calculus to explain why 
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10.10 Computer Graphics 

In this section we assume that a view of a three-dimensional object is displayed on a video screen and show 
how matrix algebra can be used to obtain new views of the object by rotation, translation, and scaling. 



Prerequisites 

Matrix Algebra 
Analytic Geometry 



Visualization of a Tliree-Dimensional Object 



Suppose that we want to visualize a three-dimensional object by displaying various views of it on a video 
screen. The object we have in mind to display is to be determined by a finite number of straight line segments. 
As an example, consider the truncated right pyramid with hexagonal base illustrated in Figure 10.10.1. We first 
introduce an xyz-coordinate system in which to embed the object. As in Figure 10.10.1, we orient the coordinate 
system so that its origin is at the center of the video screen and the xy-plane coincides with the plane of the 
screen. Consequently, an observer will see only the projection of the view of the three-dimensional object onto 
the two-dimensional xy-plane. 




Figure 10.10.1 



In the xyz-coordinate system, the endpoints Pj, P2, - -^Pn the straight line segments that determine the view 
of the object will have certain coordinates — say. 

These coordinates, together with a specification of which pairs are to be connected by straight line segments. 



are to be stored in the memory of the video display system. For example, assume that the 12 vertices of the 
truncated pyramid in Figure 10.10.1 have the following coordinates (the screen is 4 units wide by 3 units high): 



^1 

Pi 
P9 



(1.000, - .800. .000). 
(-.500, - .800, - .866), 
(-.500, - .800, .866), 
(.840, - .400. .000), 
(-.210, .650, - .364), 



P2:(.500, - .800, - .866), 
/'4:(- 1.000, - .800, .000), 
^^6:0500, - .800, .866), 
/>8: (.315, . 125, - .546), 
Plo:(-.360, .800, .000), 
P12: (.315, . 125, .546) 



♦ denotes that 



Pli:(-.210, .650, .364), 

These 12 vertices are connected pairwise by 18 straight line segments as follows, where Py 
point Fj is connected to point Py. 

P\^P2, P2^PZ> Pz^Pa. Pa^P5. P5^P6. Pe^Pu 

P-j^Pz, P%^P9. P9^P\o> P\Q^P\u Pn^P\2. Pn^Pi^ 

Pl^yPj, P2->P%. P3^>P9. P4^>P\Q, P5^>P\h P6''>P\2 

In View 1 these 1 8 straight line segments are shown as they would appear on the video screen. It should be 
noticed that only the x- andj^-coordinates of the vertices are needed by the video display system to draw the 
view, because only the projection of the object onto the xy-plane is displayed. However, we must keep track of 
the z-coordinates to carry out certain transformations discussed later. 



-1 



View 1 



We now show how to form new views of the object by scaling, translating, or rotating the initial view. We first 
construct a 3 x « matrix P, referred to as the coordinate matrix of the view, whose columns are the coordinates 
of the n points of a view: 

p= y\ yi yn 

z\ Z2 z„ 



For example, the coordinate matrix P corresponding to View 1 is the 3 x 12 matrix 
1.000 .500 -.500 -1.000 -.500 .500 .840 .315 -.210 
-.800 -.800 -.800 -.800 -.800 -.800 -.400 .125 .650 
.000 -.866 -.866 .000 .866 .866 .000 -.546 -.364 



-.360 -.210 .315" 
.800 .650 .125 
.000 .364 .546 



We will show below how to transform the coordinate matrix P of a view to a new coordinate matrix P' 
corresponding to a new view of the object. The straight line segments connecting the various points move with 
the points as they are transformed. In this way, each view is uniquely determined by its coordinate matrix once 
we have specified which pairs of points in the original view are to be connected by straight lines. 



Scaling 



The first type of transformation we consider consists of scaling a view along the x, y, and z directions by factors 
of a, P, and y, respectively. By this we mean that if a point has coordinates (x^^ y^^ Zj ) in the original view, it 
is to move to a new point P^. with coordinates (cwcj , dy^, 'yzj) the new view. This has the effect of 
transforming a unit cube in the original view to a rectangular parallelepiped of dimensions a x /3 x 7 (Figure 
10.10.2). Mathematically, this may be accomplished with matrix multiplication as follows. Define a 3 x 3 
diagonal matrix 



0 0 
0 0 

0 7 



Then, if a point f j in the original view is represented by the column vector 



then the transformed point P' is represented by the column vector 



a 0 0 
0 ;3 0 
0 0 7 



-^1 

yi 



Using the coordinate matrix P, which contains the coordinates of all n points of the original view as its columns, 
we can transform these n points simultaneously to produce the coordinate matrix P' of the scaled view, as 
follows: 





OL 0 


0 




"^1 


X2 .. 






0 ^ 


0 




yi 


yi ■■ 


■ yn 




0 0 


7 






Z2 ■■ 






'OlX\ 












0yi 


I3y2 




^yn 


= P' 






"^2 









The new coordinate matrix can then be entered into the video display system to produce the new view of the 
object. As an example. View 2 is View 1 scaled by setting ^ = 1.8? ^ = 0.5, and -;. = 3.0. Note that the scaling 
7=3.0 along the z-axis is not visible in View 2, since we see only the projection of the object onto the 
xy-plane. 



T 

1 




(«) 




(b) 

Figure 10.10.2 



2 10 1 

1 

0 



1 



View 2 View 1 scaled by q:= l.g, f9 = 0.5, 7 = 3.0 



Translation 

We next consider the transformation of translating or displacing an object to a new position on the screen. 
Referring to Figure 10.10.3, suppose we desire to change an existing view so that each point with 
coordinates (xi^yi, Zj ) moves to a new point P'^ with coordinates (xj XQ^y^ + yo^Zj +Z{})' The vector 

is called the translation vector of the transformation. By defining a 3 x « matrix T as 




7 = 



XQ XQ ... 

70 70 --- 70 

ZQ ZQ ... ZQ 



we can translate all n points of the view determined by the coordinate matrix P by matrix addition via the 
equation 

P' = P+T 

The coordinate matrix P' then specifies the new coordinates of the n points. For example, if we wish to 
translate View 1 according to the translation vector 

"1.2" 
0.4 

1.7 

the result is View 3. Note, again, that the translation z^= 1.7 along the z-axis does not show up explicitly in 
View 3. 

10 12 



View 3 View 1 translated by ;)fQ = 1.2, 70 = ^-4, zq = 1.7 . 



Figure 10.10.3 



In Exercise 7, a technique of performing translations by matrix multiplication rather than by matrix addition is 
explained. 



Rotation 

A more complicated type of transformation is a rotation of a view about one of the three coordinate axes. We 
begin with a rotation about the z-axis (the axis perpendicular to the screen) through an angle 9. Given a point Pj 
in the original view with coordinates (xj^y^, Zj), we wish to compute the new coordinates (x'^^ y^^, of the 



rotated point P'. Referring to Figure 10.10.4 and using a little trigonometry, you should be able to derive the 
following: 

tt' = p cos(0 + ff) = p cos ^ cos — /J sin 0 sin 0 = Xi cos — 7i sin 9 
y\ = p sin(^ + fl) = p cos ^ sin + p sin ^ cos = Xi sin + 7i cos 0 



These equations can be written in matrix form as 



cosff — sin^ 0 
sin 9 cos 9 0 
0 0 1 



^2 



If we let R denote the 3 x 3 matrix in this equation, all n points can be rotated by the matrix product P' = RP to 
yield the coordinate matrix P' of the rotated view. 



Figure 10.10.4 



Rotations about the x- andj^-axes can be accomplished analogously, and the resulting rotation matrices are 
given with Views 4, 5, and 6. These three new views of the truncated pyramid correspond to rotations of View 1 
about the x-, y-, and z-axes, respectively, each through an angle of 90*^. 



Rotation abiMit the r-axis 




View 4 View 1 rotated 90° about the x-axis 




View 5 View 1 rotated 90*^ about the j^-axis. 




2 



View 6 View 1 rotated 90*^ about the z-axis. 



Rotations about three coordinate axes may be combined to give oblique views of an object. For example, View 
7 is View 1 rotated first about the x-axis through 30*^, then about the j;-axis through —70°? ^^d finally about the 
z-axis through —27°- Mathematically, these three successive rotations can be embodied in the single 
transformation equation = RP, where R is the product of three individual rotation matrices: 



in the order 



R2 = 



1 0 0 

0 cos (30°) -sin(30°) 

0 sm(30°) cos(30°) 

cos (-70°) 0 sin(-70°) 

0 1 0 

-sin(-70°) 0 cos (-70°) 

cos(-27°) -sin(-27°) 0 

sm(-27°) cos(-27°) 0 

0 0 1 





.305 


-.025 


-.952 


R = R3R2R1 = 


-.155 


.985 


-.076 




.940 


.171 


.296 



-2 -I 0 I 2 




View 7 Oblique view of truncated pyramid. 

As a final illustration, in View 8 we have two separate views of the truncated pyramid, which constitute a 
stereoscopic pair. They were produced by first rotating View 7 about the j;-axis through an angle of —3° and 
translating it to the right, then rotating the same View 7 about the j^-axis through an angle of | 3"^ and 
translating it to the left. The translation distances were chosen so that the stereoscopic views are about 2^ 

inches apart — the approximate distance between a pair of eyes. 




View 8 Stereoscopic figure of truncated pyramid. The three-dimensionality of the diagram can be seen 
by holding the book about one foot away and focusing on a distant object. Then by shifting your 
gaze to View 8 without refocusing, you can make the two views of the stereoscopic pair merge 
together and produce the desired effect. 

Exercise Set 10.10 

1. View 9 is a view of a square with vertices (0, 0, 0), (1, 0, 0), (1, 1,0), and (0, 1,0). 

(a) What is the coordinate matrix of View 9? 

(b) What is the coordinate matrix of View 9 after it is scaled by a factor 1-^ in the x-direction and -i in the 
y-direction? Draw a sketch of the scaled view. 

(c) What is the coordinate matrix of View 9 after it is translated by the following vector? 

"-2" 
-1 
3 

Draw a sketch of the translated view. 



(d) What is the coordinate matrix of View 9 after it is rotated through an angle of —30° about the z-axis? 
Draw a sketch of the rotated view. 



0 I 



Ex- View 9 Square with vertices (0, 0, 0), (1, 0, 0), (1, 1, 0), and (0, 1, 0) (Exercises 1 and 2) 



Answer: 



(a) 



(b) 



0 110 
0 0 11 
0 0 0 0 

olio 

» » H 

0 0 0 0 



(c) 



-2 -1 -1 -2 
-1-1 0 0 

3 3 3 3 



(d) 

















1 



















0 .866 1.366 .500 
0 -.500 .366 .866 
0 0 0 0 



(a) If the coordinate matrix of View 9 is multiplied by the matrix 

0 1 0 
0 0 1 

the result is the coordinate matrix of View 10. Such a transformation is called a shear in the x-direction 
with factor with respect to the y-coordinate. Show that under such a transformation, a point with 

coordinates , , z{) has new coordinates (xj + -i-jj , , Zj ) . 

(b) What are the coordinates of the four vertices of the shear square in View 10? 



(c) The matrix 

10 0 
.6 1 0 

0 0 1 

determines a shear in the y-direction with factor .6 with respect to the x-coordinate (an example appears 
in View 11). Sketch a view of the square in View 9 after such a shearing transformation, and find the 
new coordinates of its four vertices. 



1 II 1 



Ll 



Ex- View 10 View 9 sheared along the x-axis by ^ with respect to the 3;-coordinate (Exercise 2) 




Ex- View 11 View 1 sheared along the j^-axis by .6 with respect to the x-coordinate (Exercise 2). 



Answer: 



(b) 



(0, 0, 0), (1. 0. 0), 1, oj, and 1. oj 



0 



(c) (0,0,0). (1, 6,0), (1,1.6,0). (0.1.0) 

(a) The reflection about the xz-plane is defined as the transformation that takes a point (^j , y^, z{) to the 
point (xj , — 7i , Zj ) (e-g-? View 12). If P and P' are the coordinate matrices of a view and its reflection 
about the xz-plane, respectively, find a matrix M such that P' — MP. 

(b) Analogous to part (a), define the reflection about theyz-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about the j^z-plane. 

(c) Analogous to part (a), define the reflection about the xy-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about the xj^-plane. 



1 0 1 



Ex- View 12 View 1 reflected about the xz-plane (Exercise 3). 



Answer 

(a) 



(b) 



1 0 0 
0-10 

0 0 1 

-1 0 0 
0 1 0 
0 0 1 



(c) 



1 0 
0 1 
0 0 















M 













^* (a) View 13 is View 1 subject to the following five transformations: 

1 • 



1 • Scale by a factor of in the x-direction, 2 in the j^-direction, and -i in the z-direction. 

2- Translate ^ unit in the x-direction. 
2 

3. Rotate 20° about the x-axis. 

4. Rotate —45'=' about the j^-axis. 

5. Rotate 90*^ about the z-axis. 

Construct the five matrices Mi, M3, M4, and associated with these five transformations, 
(b) If P is the coordinate matrix of View 1 and is the coordinate matrix of View 13, express P' in terms 
of Ml, M2, M3, M4, and P. 




Ex- View 13 View 1 scaled, translated, and rotated (Exercise 4) 



Answer: 

(a) 

Mi = 



1 0 0 
0 2 0 
0 0 i 



, M2 = 



1 i 

2 2 
0 0 
0 0 



i 
2 
0 
0 



1 0 0 

o o 

0 COS 20 —sin 20 

o o 

0 sin 20 cos 20 



0-10 
1 0 0 

0 0 1 



COS (-45 ) 0 sin (-45 ) 
M4= 0 1 0 , M5 = 

-sin (-45'') 0 cos (-45^) 

(b) P' = M3M4M3(MiP + M2) 

^' (a) View 14 is View 1 subject to the following seven transformations: 



1 . Scale by a factor of .3 in the x-direction and by a factor of .5 in the j;-direction. 

2. Rotate 45° about the x-axis. 

3. Translate 1 unit in the x-direction. 

4. Rotate 35° about thej^-axis. 

5. Rotate -45° about the z-axis. 

6. Translate 1 unit in the z-direction. 

7. Scale by a factor of 2 in the x-direction. 

Construct the matrices M\, M2, Mj associated with these seven transformations, 
(b) If P is the coordinate matrix of View 1 and P' is the coordinate matrix of View 14, express in terms 
oiM\, M2, Mj, and P. 





1 































Ex- View 14 View 1 scaled, translated, and rotated (Exercise 5). 



Answer: 



(a) 



Mi = 



M4 = 



.3 0 0 
0 .5 0 
0 0 1 



, M2 = 



1 0 0 
0 cos 45° -sin 45° 
0 sin 45° cos 45° 



, M3 = 



1 1 
0 0 
0 0 



cos 35 0 sin 35 

0 1 0 

o o 

—sin 35 0 cos 35 



, M5 = 



cos (-45 ) -sin (-45 ) 0 

sin (-45°) cos (-45°) 0 

0 



0 





"0 


0 • 


• 0" 




'2 


0 


o' 




0 


0 • 


• 0 




0 


1 


0 




1 


1 • 


• 1 




0 


0 


1 



(b) P' = Mj(M5M4(M2MiP + M3) + Me) 

6. Suppose that a view with coordinate matrix P is to be rotated through an angle 9 about an axis through the 
origin and specified by two angles a and P (see Figure Ex-6). If P' is the coordinate matrix of the rotated 
view, find rotation matrices Ri, R2,R2r> ^4? ^5 such that 

P' = R^R4R2R2RlP 
[Hint: The desired rotation can be accomplished in the following five steps: 

1 . Rotate through an angle of P about the 3;-axis. 

2. Rotate through an angle of a about the z-axis. 

3. Rotate through an angle of 9 about the j;-axis. 

4. Rotate through an angle of -a about the z-axis. 

5. Rotate through an angle of -p about the 3;-axis.] 







M 










X 


1 


— 



Figure Ex-6 



Answer: 



Rl = 



J?3 = 



J?5 = 



COS ff 0 sin j9 

0 1 0 

—sin j9 0 cos j9 

003 0 0 sin/9" 

0 1 0 

— sinfl 0 cos ft' 

cos;3 0 — sin^ 

0 1 0 

sm0 0 cos j3 



J?2 = 



.^4 = 



COS oe "Sin oc 

sin £K cos 
0 0 



cos Q sino 0 
—sin a cos o 0 
0 0 1 



7. This exercise illustrates a technique for translating a point with coordinates (xj-, j/j , Zj ) to a point with 
coordinates (x,- + xg, J'l +>'0» +^o) matrix multiplication rather than matrix addition, 
(a) Let the point (x^^ y^^ z,-) be associated with the column vector 

1 

and let the point + ;co» yi +>'0» +^o) associated with the column vector 



1 



Find a 4 X 4 matrix M such that = irfvi-. 

(b) Find the specific 4x4 matrix of the above form that will effect the translation of the point (4, — 2, 
to the point ( — 1, 7, 0). 



Answer: 



(a) 



M = 



(b) 



10 0 XQ 

0 1 0 ;/0 

0 0 1 ZQ 

0 0 0 1 



10 0-5 

0 10 9 

0 0 1-3 

0 0 0 1 



8. For the three rotation matrices given with Views 4, 5, and 6, show that 

(A matrix with this property is called an orthogonal matrix. See Section 7.1.) 



Section 10.10 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a 
basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you 
will be able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Let {a, b, c) be a unit vector normal to the plane ax ^by ^cz = 0? ^nd let r = (x, y, z) be a vector. It 
can be shown that the mirror image of the vector r through the above plane has coordinates 
^m = {^m>ym.^m)^ where 



^7n 




'x' 


ym 


= M 


y 






z 



with 





'1 


0 


0" 




'a 


M = /-2im^ = 


0 


1 


0 


-2 


b 




0 


0 


1 




c 



{a b c\ 



(a) Show that = / ^nd give a physical reason why this must be so. \Hint: Use the fact that {a, b, c) is a 
unit vector to show that n^n = 1 •] 

(b) Use a computer to show that det(M) = — 1 . 

(c) The eigenvectors of M satisfy the equation 











'x ' 


ym 


= M 


y 


= X 


y 






z 




2. 



and therefore correspond to those vectors whose direction is not affected by a reflection through the plane. 
Use a computer to determine the eigenvectors and eigenvalues of M, and then give a physical argument to 
support your answer. 

T2. A vector v = {x, y, z) is rotated by an angle 9 about an axis having unit vector {a, b, c) , thereby forming 
the rotated vector vj^ = (x^, yj^, zj^) • It can be shown that 







'x' 


yR 


= R(B) 


y 






z 



with 



R(0) = cos(e) 



0 0 






'a' 




1 0 


+ (\-cos(ff)) 


b 




0 1 






c 








0 ■ 


-c 


b 




+ sin(ff) 


c 


0 


—a 






-b 


a 


0 



b c] 



(a) Use a computer to show that R(U)R(>^) = R(0 + y?), and then give a physical reason why this must be so. 
Depending on the sophistication of the computer you are using, you may have to experiment using different 
values of a, b, and 

(b) Show also that R (0) =R(-e) and give a physical reason why this must be so. 

(c) Use a computer to show that dtt{R(0)) = + 1. 

Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.11 Equilibrium Temperature Distributions 



In this section we will see that the equilibrium temperature distribution within a trapezoidal plate can be found 
when the temperatures around the edges of the plate are specified. The problem is reduced to solving a system of 
linear equations. Also, an iterative technique for solving the problem and a "random walk" approach to the 
problem are described. 



Prerequisites 



Linear Systems 
Matrices 

Intuitive Understanding of Limits 



□ □ 



Boundary Data 

Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.11.1a are insulated from heat. Suppose 
that we are also given the temperature along the four edges of the plate. For example, let the temperature be 
constant on each edge with values of C^, 0°, 1'=", and 2'^, as in the figure. After a period of time, the temperature 
inside the plate will stabilize. Our objective in this section is to determine this equilibrium temperature distribution 
at the points inside the plate. As we will see, the interior equilibrium temperature is completely determined by the 
boundary data — that is, the temperature along the edges of the plate. 




Tempcraluie = I ' 



Ui) (b) 
Figure 10.11.1 

The equilibrium temperature distribution can be visualized by the use of curves that connect points of equal 
temperature. Such curves are called isotherms of the temperature distribution. In Figure 10. 1 1.1 Z? we have 
sketched a few isotherms, using information we derive later in the chapter. 



Although all our calculations will be for the trapezoidal plate illustrated, our techniques generalize easily to a plate 
of any practical shape. They also generalize to the problem of finding the temperature within a three-dimensional 
body. In fact, our "plate" could be the cross section of some solid object if the flow of heat perpendicular to the 
cross section is negligible. For example. Figure 10.11.1 could represent the cross section of a long dam. The dam is 
exposed to three different temperatures: the temperature of the ground at its base, the temperature of the water on 
one side, and the temperature of the air on the other side. A knowledge of the temperature distribution inside the 
dam is necessary to determine the thermal stresses to which it is subjected. 

Next we will consider a certain thermodynamic principle that characterizes the temperature distribution we are 
seeking. 



The Mean-Value Property 

There are many different ways to obtain a mathematical model for our problem. The approach we use is based on 
the following property of equilibrium temperature distributions. 



THEOREM 10.11.1 The Mean-Value Property 

Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is any circle with 
center at P that is completely contained in the plate, the temperature at P is the average value of the 
temperature on the circle (Figure 10.11.2). 




0r 

Figure 10.11.2 



This property is a consequence of certain basic laws of molecular motion, and we will not attempt to derive it. 
Basically, this property states that in equilibrium, thermal energy tends to distribute itself as evenly as possible 
consistent with the boundary conditions. It can be shown that the mean-value property uniquely determines the 
equilibrium temperature distribution of a plate. 

Unfortunately, determining the equilibrium temperature distribution from the mean- value property is not an easy 
matter. However, if we restrict ourselves to finding the temperature only at a finite set of points within the plate, 
the problem can be reduced to solving a linear system. We pursue this idea next. 



Discrete Formulation of tine Problem 



We can overlay our trapezoidal plate with a succession of finer and finer square nets or meshes (Figure 10.11.3). In 
{a) we have a rather coarse net; in {b) we have a net with half the spacing as in (a); and in (c) we have a net with 
the spacing again reduced by half The points of intersection of the net lines are called mesh points. We classify 
them as boundary mesh points if they fall on the boundary of the plate or as interior mesh points if they lie in the 
interior of the plate. For the three net spacings we have chosen, there are 1, 9, and 49 interior mesh points, 
respectively. 




I 1 1 

(a) 1 interior mesh point 




0 

40 



11111 

(fe) 9 interior mesh points 




111111111 

(r) 49 interior mesh points 



Figure 10.11.3 



In the discrete formulation of our problem, we try to find the temperature only at the interior mesh points of some 
particular net. For a rather fine net, as in (c), this will provide an excellent picture of the temperature distribution 
throughout the entire plate. 

At the boundary mesh points, the temperature is given by the boundary data. (In Figure 10.11.3 we have labeled all 
the boundary mesh points with their corresponding temperatures.) At the interior mesh points, we will apply the 
following discrete version of the mean- value property. 

□ 



THEOREM 10.11.2 Discrete Mean-Value Property 

At each interior mesh point, the temperature is approximately the average of the temperatures at the four 
neighboring mesh points. 

□ □ 



This discrete version is a reasonable approximation to the true mean- value property. But because it is only an 
approximation, it will provide only an approximation to the true temperatures at the interior mesh points. However, 
the approximations will get better as the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the 
approximations approach the exact temperature distribution, a fact proved in advanced courses in numerical 
analysis. We will illustrate this convergence by computing the approximate temperatures at the mesh points for the 
three mesh spacings given in Figure 10.11.3. 

Case {a) of Figure 10.11.3 is simple, for there is only one interior mesh point. If we let be the temperature at this 



mesh point, the discrete mean- value property immediately gives 

^0 = i(2 + l + 0 + 0) = 0.75 

In case {b) we can label the temperatures at the nine interior mesh points t\,t2,..., tg, as in Figure 10.1 1.3Z?. (The 
particular ordering is not important.) By applying the discrete mean-value property successively to each of these 
nine mesh points, we obtain the following nine equations: 



tl = 


i(/2 + 2 + 0 + 0) 


i2 = 


^(^1+^3 + ^4 + 2) 


t2 = 


^(^2+^5 + 0 + 0) 






h = 




t6 = 


^(£5+^9 + 0 + 0) 




^(f4+i8 + l + 2) 


tz = 




t9 = 





This is a system of nine linear equations in nine unknowns. We can rewrite it in matrix form as 



t = ift + b 



(2) 



where 



t = 



h 

t9 



M = 



0 


1 


0 


0 


0 


0 


0 


0 


0 




4 
















1 


0 


1 


1 


0 


0 


0 


0 


0 


4 




4 


4 












0 


1 


0 


0 


1 


0 


0 


0 


0 




4 






4 










0 


1 


0 


0 


1 


0 


1 


0 


0 




4 






4 




4 




0 


0 


1 


1 


0 


1 


0 


1 


0 






4 


4 




4 




4 




0 


0 


0 


0 


1 


0 


0 


0 


1 










4 








4 


0 


0 


0 


1 


0 


0 


0 


1 


0 








4 








4 




0 


0 


0 


0 


1 


0 


1 


0 


1 










4 




4 




4 


0 


0 


0 


0 


0 


1 


0 


1 


0 












4 




4 





b = 



1 
2 
0 
1 
2 
0 
0 
3 
4 
i 
4 
1 
4 



To solve Equation 2, we write it as 



The solution for t is thus 



(/-M)t = b 



(3) 



as long as the matrix (/ — M) is invertible. This is indeed the case, and the solution for t as calculated by 3 is 

0.7846 
1.1383 
0.4719 
1.2967 

t= 0.7491 (4) 
0.3265 

1.2995 
0.9014 
0.5570 

Figure 10.11.4 is a diagram of the plate with the nine interior mesh points labeled with their temperatures as given 
by this solution. 




1 1 1 

Figure 10.11.4 



For case (c) of Figure 10.11.3, we repeat this same procedure. We label the temperatures at the 49 interior mesh 
points as t\, t2, ■■; t49 in some manner. For example, we may begin at the top of the plate and proceed from left to 
right along each row of mesh points. Applying the discrete mean-value property to each mesh point gives a system 
of 49 linear equations in 49 unknowns: 



il = ^(^2 + 2 + 0 + 0) 
^2 = {(^1+^3 + ^4+ 2) 



'48 = ^(i41+'47 + '49 + l) 
'49 = ^('42 + '48 + 0 + 1) 



In matrix form, Equations 5 are 



t = Mt I b 

where t and b are column vectors with 49 entries, and M is a 49 x 49 matrix. As in 3, the solution for t is 



In Figure 10. 1 1 .5 we display the temperatures at the 49 mesh points found by Equation 6. The nine unshaded 
temperatures in this figure fall on the mesh points of Figure 10.11.4. 




1.3625 0.8048 0.3528 >0 



2^ > 1.4844 I.0I22 0.6064 0.2710 — f 0 




1.5627 1.1533—0.7896 0.4778 - 0.2162— >,0 



1.6131 0.9210 0.6342 0.3868 0.1756 0 



1.6409 1.3078 - L0114 - 0.7513 - 0.5214 - 0.3157 - 0.1344 



In 1.6426 1.3301 1.0657 0.8380 0.6318 0.4312 0.2221 fO 



2 i i 1^994 - 1.3042 - 1.0834 - 0.9032 - 0.7365 - 0.5554 - 0.3227 0 



L4508— L2039 1.0605 0.9548 0.8556 0.7311 0.5135 ♦O 



I I 1 1 




1 1 1 1 1 1 i 1 

Figure 10.11.5 



In Table 1 we compare the temperatures at these nine common mesh points for the three different mesh spacings 
used. 



Table 1 





Temptraturt^ at Common 




Mesh Points 






Case {a) 


Case (b) 


Case (<■) 






(J. /I>4o 


0.8048 






1 . 1 3» J 


1.1533 







0.4719 


0.4778 






1.2967 


1.3078 


h 


0.7500 


0.7491 


0.7513 


'6 




0.3265 


0.3157 


h 




1.2995 


1.3042 


h 




0.9014 


0.9032 


V 




0.5570 


0.5554 



Knowing that the temperatures of the discrete problem approach the exact temperatures as the mesh spacing 
decreases, we may surmise that the nine temperatures obtained in case (c) are closer to the exact values than those 
in case {b). 



A Numerical Technique 

To obtain the 49 temperatures in case (c) of Figure 10.11.3, it was necessary to solve a linear system with 49 
unknowns. A finer net might involve a linear system with hundreds or even thousands of unknowns. Exact 
algorithms for the solutions of such large systems are impractical, and for this reason we now discuss a numerical 
technique for the practical solution of these systems. 

To describe this technique, we look again at Equation 2: 

t=Mt + b (7) 

The vector t we are seeking appears on both sides of this equation. We consider a way of generating better and 
better approximations to the vector solution t. For the initial approximation we can take = 0 if no better 
choice is available. If we substitute t'^'^ into the right side of 7 and label the resulting left side as t^^^, we have 

ta) = j|/t(P)+b (8) 

If we substitute t"^^^ into the right side of 7, we generate another approximation, which we label t®: 

t^ = Mt^^^ + h (9) 



Continuing in this way, we generate a sequence of approximations as follows: 



(10) 



One would hope that this sequence of approximations t^, t^^\ t®, ... converges to the exact solution of 7. We do 

not have the space here to go into the theoretical considerations necessary to show this. Suffice it to say that for the 
particular problem we are considering, the sequence converges to the exact solution for any mesh size and for any 
initial approximation ffP). 

This technique of generating successive approximations to the solution of 7 is a variation of a technique called 
Jacobi iteration', the approximations themselves are called iterates. As a numerical example, let us apply Jacobi 
iteration to the calculation of the nine mesh point temperatures of case {b). Setting t® = Q, we have, from 
Equation 2, 

^5000 
.5000 
.0000 
.5000 
.0000 
.0000 
.7500 
.2500 
.2500 



tC^) = Mt® + b = MO + b = b = 



0 


1 

4 


0 


0 


0 


0 


0 


0 


0 


1 

4 


0 


1 
4 


1 
4 


0 


0 


0 


0 


0 


0 


1 
4 


0 


0 


1 

4 


0 


0 


0 


0 


0 


1 
4 


0 


0 


1 
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Some additional iterates are 



0.6875 




0.7791 
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All iterates beginning with the thirtieth are equal to t^*^ to four decimal places. Consequently, is the exact 
solution to four decimal places. This agrees with our previous result given in Equation 4. 



The Jacobi iteration scheme applied to the linear system 5 with 49 unknowns produces iterates that begin repeating 
to four decimal places after 119 iterations. Thus, t^^^^ would provide the 49 temperatures of case (c) correct to 
four decimal places. 



A Monte Carlo Technique 



In this section we describe a so-called Monte Carlo technique for computing the temperature at a single interior 
mesh point of the discrete problem without having to compute the temperatures at the remaining interior mesh 
points. First we define a discrete random walk along the net. By this we mean a directed path along the net lines 
(Figure 10.11.6) that joins a succession of mesh points such that the direction of departure from each mesh point is 
chosen at random. Each of the four possible directions of departure from each mesh point along the path is to be 
equally probable. 




1 I I 1 
Figure 10.11.6 



By the use of random walks, we can compute the temperature at a specified interior mesh point on the basis of the 
following property. 



THEOREM 10.11.3 Random Walk Property 



Let Wi, W2, be a succession of random walks, all of which begin at a specified interior mesh point. 

Let ^2 » - ^'^^ temperatures at the boundary mesh points first encountered along each of these 

random walks. Then the average value \ 1 ... I of these boundary temperatures approaches 

the temperature at the specified interior mesh point as the number of random walks n increases without 
bound. 



This property is a consequence of the discrete mean- value property that the mesh point temperatures satisfy. The 
proof of the random walk property involves elementary concepts from probability theory, and we will not give it 
here. 

In Table 2 we display the results of a large number of computer-generated random walks for the evaluation of the 
temperature of the nine-point mesh of case {b) in Figure 10.11.6. The first column lists the number n of the 
random walk. The second column lists the temperature ^ * of the boundary point first encountered along the 
corresponding random walk. The last column contains the cumulative average of the boundary temperatures 
encountered along the n random walks. Thus, after 1000 random walks we have the approximation tr^ ^ .7550. 
This compares with the exact value = .7491 that we had previously evaluated. As can be seen, the convergence 
to the exact value is not too rapid. 

Table 2 
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Exercise Set 10.11 

1. A plate in the form of a circular disk has boundary temperatures of Q"^ on the left of its circumference and ] on 
the right half of its circumference. A net with four interior mesh points is overlaid on the disk (see Figure 
Ex-1). 

(a) Using the discrete mean- value property, write the 4 x 4 linear system t = Mt + b that determines the 
approximate temperatures at the four interior mesh points. 

(b) Solve the linear system in part (a). 

(c) Use the Jacobi iteration scheme with t'^^ = 0 to generate the iterates t^^^, , , t^'^, and t^-^^ for the 
linear system in part (a). What is the "error vector" t*^— ' — where t is the solution found in part (b)? 



(d) By certain advanced methods, it can be determined that the exact temperatures to four decimal places at the 
four mesh points are = ^3 = .2871 and ^2 = ^4 = 7129. What are the percentage errors in the values 
found in part (b)? 
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Figure Ex-1 



Answer: 



(a) 



(b) 



t = 



(c) 
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(d) for i 1 and _ 1 2. 9%; for ti and t&,, 5.2% 
2. Use Theorem 10.11.1 to find the exact equilibrium temperature at the center of the disk in Exercise 1 . 



Answer: 



1 

2 

3. Calculate the first two iterates t^^'' and for case (b) of Figure 10.11.3 with nine interior mesh points 
[Equation 2] when the initial iterate is chosen as 

t®=[l 11111111]^ 



Answer: 
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4. The random walk illustrated in Figure Ex-4a can be described by six arrows 

that specify the directions of departure from the successive mesh points along the path. Figure Ex-4Z? is an array 
of 100 computer-generated, randomly oriented arrows arranged in a 10 x 10 array. Use these arrows to 
determine random walks to approximate the temperature ^^5, as in Table 2. Proceed as follows: 

1 . Take the last two digits of your telephone number. Use the last digit to specify a row and the other to specify 
a column. 

2. Go to the arrow in the array with that row and column number. 

3. Using this arrow as a starting point, move through the array of arrows as you would read a book (left to right 
and top to bottom). Beginning at the point labeled in Figure Ex-4a and using this sequence of arrows to 
specify a sequence of directions, move from mesh point to mesh point until you reach a boundary mesh 
point. This completes your first random walk. Record the temperature at the boundary mesh point. (If you 
reach the end of the arrow array, continue with the arrow in the upper left comer.) 

4. Return to the interior mesh point labeled and begin where you left off in the arrow array; generate your 
next random walk. Repeat this process until you have completed 10 random walks and have recorded 10 
boundary temperatures. 

5. Calculate the average of the 10 boundary temperatures recorded. (The exact value is = .7491.) 
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Figure Ex-4 



Section 10.11 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 

Tl. Suppose that we have the square region described by 

K= {{x,y)\0<x<\,(i<y<\) 

and suppose that the equilibrium temperature distribution y) along the boundary is given hyu(x, 0) = T^, 
u(x,\) = Tj, y (0, y) = Tl, and y (1^ ^) = Tj^- Suppose next that this region is partitioned into an 
+ 1) X (« + 1) mesh using 



Xi = — 



and 



for j = 0, 1, 2, « and y = 0, 1, 2, If the temperatures of the interior mesh points are labeled by 

Uij = yi)=u(iln,jln) 

then show that 

for i = 1, 2, 3, — 1 and y = 1, 2, 3, — 1. To handle the boundary points, define 

"Oj = Tl, Uyij = Tr, Ui^Q = Tb, and = Tt 

for i = 1, 2, 3, « — 1 and y = 1, 2, 3, « — 1. Next let 

0 I„ 



1 0 



be the («+l)x(«+l) matrix with the « x « identity matrix in the upper right-hand comer, a one in the lower 
left-hand comer, and zeros everywhere else. For example, 



F2 = 
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1 0 
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0 0 10 
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and so on. By defining the (« + 1) x (« + 1) matrix 



show that if Uy^^i is the + 1) x (« + 1) matrix with entries "i;, then the set of equations 



for i = 1, 2, 3, ..,» — ! and j = \,2, 3, — ! can be written as the matrix equation 

Un+l = -^i^n+lUn+l + U„+iM„+i) 

where we consider only those elements of ^^+1 ^i^h i = 1 , 2, 3, — 1 and j — \,2,3, n — l. 

T2. The results of the preceding exercise and the discussion in the text suggest the following algorithm for solving 
for the equilibrium temperature in the square region 

R= {(x.;/)|0<x< 1.0 <;/<!} 

given the boundary conditions 

u(x,0) = Tb, u(x.I) = Tt. 
»(0,y) = TL, u(\,y) = TR 

1 . Choose a value for n, and then choose an initial guess, say 

" 0 Ti ... Tl 0 " 
Tb 0 ... 0 Tt 



- 



Tb 0 ... 0 Tj 
0 Tr ... Tr 0 



2. For each value of it = 0, 1, 2, 3 compute U^^^^ using 

where Jlrf"M+i is as defined in Exercise Tl . Then adjust U^^^^ by replacing all edge entries by the initial edge 



entries in • [Note: The edge entries of a matrix are the entries in the first and last columns and first and 
last rows.] 

3. Continue this process until U^^^^ — C/^^i approximately the zero matrix. This suggests that 



Use a computer and this algorithm to solve for y) given that 

uix,0) = 0. u(x,-l) = 0, u(0,y) = 0, uO,y) = 2 
Choose K = 6 and compute up to U^^^ ■ The exact solution can be expressed as 

„/_ = "T s"^' [ ^2/^; - 1 ) '-^ ] sm [ (2.:'; - 1 ; ry ] 
'm^l (2«-l)siiih[(2M-l)ir] 

Use a computer to compute u(i f 6, J 1 6) for i,J = 0, 1, 2, 3, 4, 5, 6, and then compare your results to the values 

T3. Using the exact solution y) for the temperature distribution described in Exercise T2 , use a graphing 
program to do the following: 

(a) Plot the surface z — u(x^ y) in three-dimensional xyz-space in which z is the temperature at the point (x, y) in 
the square region. 

(b) Plot several isotherms of the temperature distribution (curves in the xy-plane over which the temperature is a 
constant). 

(c) Plot several curves of the temperature as a function of x with j held constant. 



(d) Plot several curves of the temperature as a function of with x held constant. 
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10.12 Computed Tomography 



In this section we will see how constructing a cross-sectional view of a human body by analyzing X-ray scans leads to an inconsistent linear 
system. We present an iteration technique that provides an "approximate solution" of the linear system. 

Prerequisites 

Linear Systems 
Natural Logarithms 
Euclidean Space 



The basic problem of computed tomography is to construct an image of a cross section of the human body using data collected from many 
individual beams of X rays that are passed through the cross section. These data are processed by a computer, and the computed cross section is 
displayed on a video monitor. Figure 10.12.1 is a diagram of General Electric's CT system showing a patient prepared to have a cross section of 
his head scanned by X-ray beams. 




Figure 10.12.1 



Such a system is also known as a CAT scanner, for Computer-y4ided Tomography scanner. Figure 10.12.2 shows a typical cross section of a 
human head produced by the system. 




Figure 10.12.2 



The first commercial system of computed tomography for medical use was developed in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 
1979, Houndsfield and A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As we will see in this section, the 
construction of a cross section, or tomograph, requires the solution of a large linear system of equations. Certain algorithms, called algebraic 
reconstruction techniques (ARTs), can be used to solve these linear systems, whose solutions yield the cross sections in digital form. 



Scanning Modes 

Unlike conventional X-ray pictures that are formed by X rays that are projected perpendicular to the plane of the picture, tomographs are 
constructed from thousands of individual, hairline-thin X-ray beams that lie in the plane of the cross section. After they pass through the cross 
section, the intensities of the X-ray beams are measured by an X-ray detector, and these measurements are relayed to a computer where they are 



processed. Figures 10.12.3 and 10.12.4 illustrate two possible modes of scanning the cross section: the parallel mode and the fan-beam mode. 
In the parallel mode a single X-ray source and X-ray detector pair are translated across the field of view containing the cross section, and many 
measurements of the parallel beams are recorded. Then the source and detector pair are rotated through a small angle, and another set of 
measurements is taken. This is repeated until the desired number of beam measurements is completed. For example, in the original 1971 
machine, 160 parallel measurements were taken through 180 angles spaced 1'=' apart: a total of 160 x 180 = 28, 800 beam measurements. Each 
such scan took approximately 5^^ minutes. 



X-ray 
dctector 




X-ray 
source 

Figure 10.12.3 




Figure 10.12.4 



In the fan-beam mode of scanning, a single X-ray tube generates a fan of coUimated beams whose intensities are measured simultaneously by 
an array of detectors on the other side of the field of view. The X-ray tube and detector array are rotated through many angles, and a set of 
measurements is taken at each angle until the scan is completed. In the General Electric CT system, which uses the fan-beam mode, each scan 
takes 1 second. 



Derivation of Equations 

To see how the cross section is reconstructed from the many individual beam measurements, refer to Figure 10.12.5. Here the field of view in 
which the cross section is situated has been divided into many square pixels (picture elements) numbered 1 through as indicated. It is our 
desire to determine the X-ray density of each pixel. In the EMI system, 6400 pixels were used, arranged in a square 80 x 80 array. The G.E. CT 
system uses 262,144 pixels ina512x512 array, each pixel being about 1 mm on a side. After the densities of the pixels are determined by the 
method we will describe, they are reproduced on a video monitor, with each pixel shaded a level of gray proportional to its X-ray density. 
Because different tissues within the human body have different X-ray densities, the video display clearly distinguishes the various tissues and 
organs within the cross section. 




Figure 10.12.5 



Figure 10.12.6 shows a single pixel with an X-ray beam of roughly the same width as the pixel passing squarely through it. The photons 
constituting the X-ray beam are absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the tissue. Quantitatively, 
the X-ray density of the yth pixel is denoted by and is defined by 

^ number of photons entering the Jth pixel \ 

^ \ number of photons leaving the jth pixel j 

where "In" denotes the natural logarithmic function. Using the logarithm property \n{a / b) = —]n{b/a),wQ also have 

(fraction of photons that pass through \ 
the jth pixel without being absorbed j 
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Figure 10.12.6 

If the X-ray beam passes through an entire row of pixels (Figure 10.12.7), then the number of photons leaving one pixel is equal to the number 
of photons entering the next pixel in the row. If the pixels are numbered 1, 2, ...,n, then the additive property of the logarithmic function gives 

' number of photons entenng the first pixel \ 
number of photons leaving the nth pixel j 

^fraction of photons thatpass^ 
through the row of n pixels 
without being absorbed 



= -In 



\ 



(1) 



Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual pixel densities. 
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Figure 10.12.7 



Next, consider the X-ray beam in Figure 10.12.5. By the beam density of the zth beam of a scan, denoted by bj, we mean 



hi =b 



f number of photons of the ith beam entering the detector ^ 
\nthout the cross section in the field of view 



= -In 



number of photons of the ith beam entering the detector 
vdth the cross section in the field of view 

^fraction of photons of the ith beam that^ 
pass through the cross section without 
being absorbed 



(2) 



The numerator in the first expression for b^ is obtained by performing a cahbration scan without the cross section in the field of view. The 
resulting detector measurements are stored within the computer's memory. Then a clinical scan is performed with the cross section in the field 
of view, the bjS of all the beams constituting the scan are computed, and the values are stored for further processing. 



For each beam that passes squarely through a row of pixels, we must have 

f fraction of photons of the ^ 
beam that pass through the 
row of pixels without being 
absorbed 



\ 



' fraction of photons of the 1 
beam that pass through the 
cross section without being 
^ absorbed 



Thus, if the zth beam passes squarely through a row of n pixels, then it follows from Equations 1 and 2 that 

XI +X2 + ... + Xyi = bi 

In this equation, bj is known from the clinical and calibration measurements, and xi, X2 x„ are unknown pixel densities that must be 

determined. 

More generally, if the ith beam passes squarely through a row (or column) of pixels with numbers Ji, J2, - Ji, then we have 



+ Xj^'^... + Xj. = bi 



If we set 



then we can write this equation as 



_ri, r£j = juJ2 Ji 

"*''~|0, ollierwise 



aax 1 + aaX2 H- . . . + aj^x^ = bi 



(3) 



We will refer to Equation 3 as the ith beam equation. 



Referring to Figure 10.12.5, however, we see that the beams of a scan do not necessarily pass through a row or column of pixels squarely. 
Instead, a typical beam passes diagonally through each pixel in its path. There are many ways to take this into account. In Figure 10.12.8 we 
outline three methods of defining the quantities "^y that appear in Equation 3, each of which reduces to our previous definition when the beam 
passes squarely through a row or column of pixels. Reading down the figure, each method is more exact than its predecessor, but with 
successively more computational difficulty. 
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C:enler Line Method 



(length of the center line ^ 
of the Ith beam that lies 
in the 7th pixel 
width of the yth pixel y 



Length of 
center hm 




IWidth of 
pixel 



Area Method 
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Figure 10.12.8 



Using any one of the three methods to define the ^i/'s in the ith beam equation, we can write the set of Mbeam equations in a complete scan as 

^21^1 » ^22^2 a2NXN = h 

In this way we have a linear system of M equations (the Mbeam equations) in unknowns (the pixel densities). 

Depending on the number of beams and pixels used, we can have M > M = N^^^ M < N- We will consider only the case M > the 
so-called overdetermined case, in which there are more beams in the scan than pixels in the field of view. Because of inherent modeling and 
experimental errors in the problem, we should not expect our linear system to have an exact mathematical solution for the pixel densities. In the 
next section we attempt to find an "approximate" solution to this linear system. 



Algebraic Reconstruction Techniques 

There have been many mathematical algorithms devised to treat the overdetermined linear system 4. The one we will describe belongs to the 
class of so-cdX\Q& Algebraic Reconstruction Techniques (ARTs). This method, which can be traced to an iterative technique originally 
introduced by S. Kaczmarz in 1937, was the one used in the first commercial machine. To introduce this technique, consider the following 
system of three equations in two unknowns: 

Ly. x\ \ X2 = 2 

Lx x\ - 2x2 = -2 (5) 
L2: 3x1 - ^2 = 3 

The lines Li, L2, determined by these three equations are plotted in the ^1^2-plane. As shown in Figure 10.12.9a, the three lines do not have 
a common intersection, and so the three equations do not have an exact solution. However, the points (7:1, 7:2) on the shaded triangle formed by 
the three lines are all situated "near" these three lines and can be thought of as constituting "approximate" solutions to our system. The 
following iterative procedure describes a geometric construction for generating points on the boundary of that triangular region (Figure 
10.12.96): 

Algorithm 1 

Step 0 Choose an arbitrary starting point xg in the 7:17:2 -plane. 

Step 1 Project xq orthogonally onto the first line L 1 and call the projection ^^^^ • The superscript 1 indicates that this is the first of several 
cycles through the steps. 

Step 2 Project orthogonally onto the second line L2 and call the projection . 



step 3 Project ^^^^ orthogonally onto the third line Z.3 and call the projection -j^^ . 

Step 4 Take -^p as the new value of xq and cycle through Steps 1 through 3 again. In the second cycle, label the projected points j-^, 

; in the third cycle, label the projected points , ' ' forth. 
This algorithm generates three sequences of points 



L3 



(1) (I) G) 



IC^^^ iP 

x^^^ iP 

X3 ,X3 ,X3 , 



that lie on the three lines Li, L2, and L3, respectively. It can be shown that as long as the three lines are not all parallel, then the first sequence 
converges to a point on L 1 , the second sequence converges to a point on Z.2, and the third sequence converges to a point x^, on 1,3 (Figure 
10.12.9c). These three limit points form what is called the limit cycle of the iterative process. It can be shown that the limit cycle is independent 
of the starting point xq. 



2x, = -2 




(a) 





(c) 

Figure 10.12.9 



Next we discuss the specific formulas needed to effect the orthogonal projections in Algorithm 1. First, because the equation of a line in ;^ 1:^2 



-space IS 

we can express it in vector form as 
where 



and 



The following theorem gives the necessary projection formula (Exercise 5). 



THEOREM 10.12.1 Orthogonal Projection Formula 



Let Z be a line in with equation a^x = b, and let be any point in (Figure 10. 12. 10). Then the orthogonal projection, , of 
X* onto L is given by 



x^ =x -I- 



(j>-a^x ) 
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a a 




Figure 10.12.10 



EXAMPLE 1 Using Algorithm 1 < 

We can use Algorithm 1 to find an approximate solution of the linear system given in 5 and illustrated in Figure 10.12.9. If we 
write the equations of the three lines as 





T 
a^x 






T 
a2X 




L3- 


T 
a3X 





where 



ai = 



a2 = 



1 

-2 



a3 = 



3 
= 1 



^1 =2, 



b2= -2, b3 = 3 

then, using Theorem 10.12.1, we can express the iteration scheme in Algorithm 1 as 

where = 1 for the first cycle of iterates, p = 2foY the second cycle of iterates, and so forth. After each cycle of iterates (i.e., 
after is computed), the next cycle of iterates is begun with xq^ ^ set equal to x^^"* • 



Table 1 gives the numerical results of six cycles of iterations starting with the initial point xq = (1, 3). 

Table 1 
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l.(X)()(K) 


3.()(M)(X) 




.00000 


2.0(X)(X) 


xi" 


.40000 


1.20000 




1.30000 


.90000 




1 .200()() 


.800(X) 
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1.44000 


X- 


1 .42000 


1.26000 


*i 


1 .0800(^ 


.92000 




.83200 


1.41600 




1.408(X) 


1.22400 




1.09200 


.90800 




.83680 


1.41840 


X- 


1.40920 


1.22760 


x"' 


1 .09080 


.90920 




.83632 


1.41816 




1.40908 


1 .22724 


^] 


1.09092 


.90908 




.83637 


1.41818 




1 .40909 


1.22728 



Using certain techniques that are impractical for large linear systems, we can show the exact values of the points of the limit cycle 
in this example to be 

xj = ^j^, j^j = (1.09090..., .90909...) 
X2 = = (.83636..., 1.41818...) 

X3* = (|l,g-j = (1.40909..., 1.22727...) 

It can be seen that the sixth cycle of iterates provides an excellent approximation to the limit cycle. Any one of the three iterates 
Xj^' X2^' X3^ t)e used as an approximate solution of the linear system. (The large discrepancies in the values ofjf^,j^, and 
are due to the artificial nature of this illustrative example. In practical problems, these discrepancies would be much smaller. 



To generalize Algorithm I so that it applies to an overdetermined system of M equations in unknowns, 



^11^1 



^22^2 



.-f 



= h 

= h 



(6) 



we introduce column vectors x and a, as follows: 



X = 



^1 
^2 



a, = 



^j2 



2=1,2, 



M 



With these vectors, the M equations constituting our linear system 6 can be written in vector form as 

^x = bi, 1= 1, 2,..., M 

Each of these M equations defines what is called a hyperplane in the A^-dimensional Euclidean space H^''^ . In general these Mhyperplanes have 
no common intersection, and so we seek instead some point in that is reasonably "close" to all of them. Such a point will constitute an 
approximate solution of the linear system, and its entries will determine approximate pixel densities with which to form the desired cross 
section. 



As in the two-dimensional case, we will introduce an iterative process that generates cycles of successive orthogonal projections onto the M 
hyperplanes beginning with some arbitrary initial point in . Our notation for these successive iterates is 

I the Iterate lying on the kth hyperplane \ 
^ [generated during thepth cycle of iterations \ 

The algorithm is as follows: 
Algorithm 2 

Step 0 Choose any point in pj'^ and label it xq. 
Step 1 For the first cycle of iterates, set = 1 . 
Step 2 For k= 1 , 2, . . compute 

, (Ajc-alx^^i) 

Step3 Set^+l)=,,W. 

Step 4 Increase the cycle number phy \ and return to Step 2. 

In Step 2 the iterate ^^'-^ is called the orthogonal projection oi-^^^ onto the hyperplane a^x = b}^- Consequently, as in the two-dimensional 
case, this algorithm determines a sequence of orthogonal projections from one hyperplane onto the next in which we cycle back to the first 
hyperplane after each projection onto the last hyperplane. 

It can be shown that if the vectors , a2, . . a span p/'^ , then the iterates , , lying on the Mth hyperplane will converge to a 

point on that hyperplane which does not depend on the choice of the initial point xq. In computed tomography, one of the iterates for p 
sufficiently large is taken as an approximate solution of the linear system for the pixel densities. 

Note that for the center-of-pixel method, the scalar quantity aja^ appearing in the equation in Step 2 of the algorithm is simply the number of 
pixels in which the Ath beam passes through the center. Similarly, note that the scalar quantity 

I T (P) 

in that same equation can be interpreted as the excess kth beam density that results if the pixel densities are set equal to the entries ofj^^^ . This 

provides the following interpretation of our ART iteration scheme for the center-of-pixel method: Generate the pixel densities of each iterate by 
distributing the excess beam density of successive beams in the scan evenly among those pixels in which the beam passes through the center 
When the last beam in the scan has been reached, return to the first beam and continue. 

EXAMPLE 2 Using Algorithm 2 < 

We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the 3 x 3 array illustrated in Figure 
10.12.1 1. These 9 pixels are scanned using the parallel mode with 12 beams whose measured beam densities are indicated in the 
figure. We choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 8, you are asked to set up the 
beam equations using the center line and area methods.) As you can verify, the beam equations are 





= 13.00 


7:3 1 1 


= 18.00 




= 15.00 


X2 \ \ xz 


= 12.00 


x\ +7:2 + 7:3 


= 8.00 


x\ +7:4 + 7:7 


= 6.00 


7:5+^^8 + ^9 


= 14.79 


7:2+7:3 + 7:6 


= 10.51 


7:3 + 7:5 + 7:7 


= 14.31 


7:1 +7:5+7:9 


= 16.13 


7:1 +7:2 + 7:4 


= 3.81 


7:4 + 7:7 + 7:3 


= 7.04 



Table 2 illustrates the results of the iteration scheme starting with an initial xq = 0. The table gives the values of each of the first 
cycle of iterates, ^^^^ through -^i^^ , but thereafter gives the iterates only for various values of p. The iterates start 
repeating to two decimal places for p > 45, and so we take the entries of approximate values of the 9 pixel densities. 




Figure 10.12.11 



Table 2 





Pixel Densities 


^1 














H 




-.(I) 
*1 

*2 

3 

x"^ 

*7 

*8 

1) 

*10 
vil) 
11 
(I) 


.00 
.00 
.00 
2.67 
2.67 
2.67 
.49 
.49 
.49 
-.31 
-.31 
1.06 
1.06 


.00 
.00 
.00 
2.67 
2.67 
2.67 
.49 
.49 
.84 
.84 
.13 
.13 
.13 


.00 
.00 
.00 
2.67 
2.67 
3.44 
3.44 
4.93 
4.93 
4.93 
4.22 
4.22 
4,22 


.00 
.00 
5.(X) 
5.00 
5.00 
5.CX) 
2.83 
2.83 
2.83 
2.02 
2.02 
2.02 
.58 


.00 
.00 
5.00 
5.00 
5.00 
5.77 
5.77 
5.77 
6.1 1 
6.11 
6.11 
7.49 
7,49 


.00 
.00 
5.(X) 
5.00 
5.37 
5.37 
5.37 
6.87 
6.87 
6.87 
6.16 
6.16 
6.16 


.(X) 
4.33 
4.33 
4.33 
4.33 
5.10 
5.10 
5.10 
5.10 
4.30 
4.30 
4.30 
2.85 


.(K) 
4.33 
4.33 
4.33 
4.71 
4.71 
4.71 
4.71 
5.05 
5.05 
5.05 
5.05 
3.61 


.(X) 
4.33 
4.33 
4.33 
4.71 
4.71 
4.71 
6.20 
6.20 
6.20 
6.20 
7.58 
7.58 




2.03 


.69 


4.42 


1.34 


7.49 


5.39 


2.65 


3.04 


6.61 




1.78 


.51 


4.52 


1.26 


7.49 


5.48 


2.56 


3.22 


6.86 


_.f4) 
^12 


1.82 


.52 


4.62 


1..37 


7,49 


5.37 


2.45 


3.22 


6.82 


*12 


1.79 


.49 


4.71 


1.43 


7.49 


5.31 


2.37 


3.25 


6.85 




1.68 


.44 


5.03 


1.70 


7.49 


5.03 


2.(W 


3.29 


6.96 


.20> 
*I2 


1.49 


.48 


5.29 


2.00 


7.49 


4.73 


1.79 


3.25 


7.15 


^12 


1.38 


.55 


5,34 


2.11 


7.49 


4.62 


1.74 


3.19 


7.26 


*12 


1.33 


.59 


5.33 


2.14 


7.49 


4.59 


1.75 


3.15 


7.31 


US) 
*12 


1.32 


.60 


5.32 


2.15 


7.49 


4.59 


1.76 


3.14 


7.32 



We close this section by noting that the field of computed tomography is presently a very active research area. In fact, the ART scheme 
discussed here has been replaced in commercial systems by more sophisticated techniques that are faster and provide a more accurate view of 
the cross section. However, all the new techniques address the same basic mathematical problem: finding a good approximate solution of a 
large overdetermined inconsistent linear system of equations. 



Exercise Set 10.12 

^* (a) Setting -J^^ — (x^^, ^^'^)' ^^^^ that the three projection equations 



for the three Hnes in Equation 5 can be written as 

k = 2: 

k = 3: 



^11 — 2'-^ ^^01 ^02 J 

^31 L^ + ^21 +^^22 J 

r*^ - -Lr - -I- Ir*^^ -I- <^r^h 
^32 - |qI"^ + ^^21 +^^22 J 



-here 4+^)) = (,CP). ,3^^ for p = 1. 2 

(b) Show that the three pairs of equations in part (a) can be combined to produce 

31 -2oL-^^H-^31 "^32 J 

.CP) - i r24 + 3x^-^^ - 3x^-^h ^ " ^' ^' 

where (jf^f , = C^oi^' ^^02) ~ ^^^^S ^^^^ P^^^ of equations, we can perform one complete cycle of three orthogonal 

projections in a single step.] 

(c) Because tends to the limit point ^s — ► oo, the equations in part (b) become 

^31 = 20 ^ """^^l "^32 J 

^3*2 = ^[24 + 37:31 -3x3*2] 

asp—^OD. Solve this linear system for X3 = (^3^ , X22)- W^te: The simplifications of the ART formulas described in this exercise are 
impractical for the large linear systems that arise in realistic computed tomography problems.] 



Answer: 



(c) 



X3 



*= /II ZL) 

3 ^22' 22 J 



2. Use the result of Exercise 1(b) to find ^^^^ _ ___ to five decimal places in Example 1 using the following initial points: 

(a) xo = (0. 0) 

(b) xo = (l. 1) 

(c) xo=(148, -15) 

Answer: 

(^) x^^^ = (1.40000, 1.20000) 
= (1.41000, 1.23000) 

xf = (1.40900, 1.22700) 

x^'^^ (1.40910, 1.22730) 

x^^ = (1.40909, 1.22727) 

xf = (1.40909, 1.22727) 
(b) Same as part (a) 



(c) x^^^ = (9.55000, 25.65000) 
x^ = (.59500, - 1.21500) 
= (1.49050, 1.47150) 
x^"^ = (1.40095, 1.20285) 
xf = (1.40991, 1.22972) 
xf = (1.40901, 1.22703) 

^' (a) Show directly that the points of the limit cycle in Example 1, 

* - (12. m * _ (46 78 > * _ (31 

'''-1,11' 11 J' ^"2- 1,55. 55} '^3-^22' 

form a triangle whose vertices lie on the lines Z.^, Z.?, and Z.3 and whose sides are perpendicular to these lines (Figure 10.12.9c). 

fh^ (X) * / 3 1 27 ^ 

Using the equations derived in Exercise 1(a), show that if Xq = X3 = ["22"' 22" J' ^^^^ 



27 

22 



*3 - ~J - I -,o ' 22 



[Note: Either part of this exercise shows that successive orthogonal projections of any point on the limit cycle will move around the 
limit cycle indefinitely.] 



4. The following three lines in the Jri7r2-plane, 



X2=\ 
x\ -7:2 = 2 

do not have a common intersection. Draw an accurate sketch of the three lines and graphically perform several cycles of the orthogonal 
projections described in Algorithm 1, beginning with the initial point aqj = (0, 0). On the basis of your sketch, determine the three points of 
the limit cycle. 

Answer: 

x; = (l.l),X2* = (2, 0),X3* = (1.1) 

5. Prove Theorem 10.12.1 by verifying that 

(a) the point "^p as defined in the theorem lies on the line _ ^ (i.e., a^x^ = b)- 

(b) the vector Xj^ — x* is orthogonal to the line _ ^ (i.e., Xp — x* is parallel to a). 

6. As stated in the text, the iterates x® , , ... defined in Algorithm 2 will converge to a unique limit point if the vectors 

ai, a2, aji^ span . Show that if this is the case and if the center-of-pixel method is used, then the center of each of the A/^ pixels in the 
field of view is crossed by at least one of the M beams in the scan. 

7. Construct the 12 beam equations in Example 2 using the center line method. Assume that the distance between the center lines of adjacent 
beams is equal to the width of a single pixel. 

Answer: 



7:4 + 7:5 + 7:6= 15.00 
7:1 +7:2 + 7:3 = 8.00 
.82843(^6 + xs) + .58579x9 = 14.79 
1.41421(;^3 + ;t5 + ;^7) = 14.31 
.82843(^2 + X4) + .58579:^1 = 3.81 
7:3 + 7:6 + 7:9= 18.00 
7:2 + 7:5 + 7:8= 12.00 
7:1 + 7:4 + 7:7 = 6.00 
.82843(;^2 + ^6) + .58579x3 = 10.51 
1.41421(^1 +X5 + X9) = 16.13 
.82843(^4 + xg) + .58579x7 = 7.04 

8. Construct the 12 beam equations in Example 2 using the area method. Assume that the width of each beam is equal to the width of a single 
pixel and that the distance between the center lines of adjacent beams is also equal to the width of a single pixel. 

Answer: 

X7 + X8 + X9= 13.00 
X4 + X5 + X6= 15.00 
XI +X2+X3= 8.00 
.04289(x3 + X5 + X7) + .75000(x6 + xg) + .61396x9 = 14.79 
.91421(x3 + X5 + X7) + .25000(x2 + X4 + X6 + X8) = 14.31 
.04289(x3 + X5 + X7) + .75000(x2 + X4) + .61396x1 = 3.81 

7:3 + X6 + X9 = 18.00 
X2 + X5 + X8= 12.00 
XI +X4 + X7 = 6.00 
.04289(xi +X5 + X9) + .75000(X2 + X6) + .61396x3 = 10.51 
.91421(xi +X5 + X9) + .25000(x2 + X4 + X6 + X8) = 16.13 
.04289(xi +X5 + X9) + .75000(x4 + X8) + .61396x7= '7 04 

Section 10.12 Technology Exercises ^ 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or 
Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each 
exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your 
technology utility to solve many of the problems in the regular exercise sets. 

Tl. Given the set of equations 

akx + bky = ci^ 

fork= 1, 2, 3, « (with ^ > 2), let us consider the following algorithm for obtaining an approximate solution to the system. 

1 . Solve all possible pairs of equations 

c2jX + = and a^x 4 b^y = cj 

for i, j = 1, 2, 3, « and i < J for their unique solutions. This leads to 

solutions, which we label as 

for /, y = 1 , 2, 3, . . ., « and i < J. 

2. Construct the geometric center of these points defined by 

(7:c,7c)= — 7- TtT. E 7- frE E ^i; 

^^«(«-l) ,-^ly^j.,.l ^ «(«-l) j^ij-^,.,.1 ^ 

and use this as the approximate solution to the original system. 



Use this algorithm to approximate the solution to the system 

x+ y = 2 
x~2y= ~2 
3x^ y = 3 

and compare your results to those in this section. 
T2. (Calculus required) Given the set of equations 

for = 1, 2, 3, « (with « > 2), let us consider the following least squares algorithm for obtaining an approximate solution (x* ,y*) to the 
system. Given a point (ot, j9) and the line b^y = cp the distance from this point to the line is given by 

If we define a function f ^x, y) by 

v2 



and then determine the point (x ,y ) that minimizes this function, we will determine the point that is closest to each of these lines in a 
summed least squares sense. Show that x* and y* are solutions to the system 



and 



Apply this algorithm to the system 



=1 af H- bj j li =1 af + bf } 1=1 af + bf 



x+ y = 2 
x~2y= - 2 
3x^ y = 3 

and compare your results to those in this section. 
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10.13 Fractals 



In this section we will use certain classes of linear transformations to describe and generate intricate sets in the Euclidean plane. These sets, called fractals, are 
currently the focus of much mathematical and scientific research. 



Prerequisites 

Geometry of Linear Operators on (Section 4.11) 

Euclidean Space 

Natural Logarithms 

Intuitive Understanding of Limits 



At the end of the nineteenth century and the beginning of the twentieth century, various bizarre and wild sets of points in the Euclidean plane began appearing in 
mathematics. Although they were initially mathematical curiosities, these sets, called fractals, are rapidly growing in importance. It is now recognized that they reveal 
a regularity in physical and biological phenomena previously dismissed as "random," "noisy," or "chaotic." For example, fractals are all around us in the shapes of 
clouds, mountains, coastlines, trees, and ferns. 

In this section we give a brief description of certain types of fractals in the Euclidean plane R^. Much of this description is an outgrowth of the work of two 
mathematicians, Benoit B. Mandelbrot and Michael Bamsley, who are both active researchers in the field. 



To begin our study of fractals, we need to introduce some terminology about sets in R^. We will call a set in R^ bounded if it can be enclosed by a suitably large 
circle (Figure 10.13.1) and closed if it contains all of its boundary points (Figure 10.13.2). Two sets in R^ will be called congruent if they can be made to coincide 
exactly by translating and rotating them appropriately within R^ (Figure 10.13.3). We will also rely on your intuitive concept of overlapping and nonoverlapping 
sets, as illustrated in Figure 10.13.4. 



Fractals in the Euclidean Plane 



Self-Similar Sets 



y Enclosing 
y""^^ ^"^S. circle 




Unbounded set 



(a) Set enclosed by a circle 



{b) This set caiinot be 
enclosed by any circle. 



Figure 10.13.1 



y 




x 



Figure 10.13.2 The boundary points (solid color) lie in the set. 




Figure 10.13.3 

y 




{a) Overlapping sets 




(Z>) Nonoverlapping sets 
Figure 10.13.4 



^^T.R^ F? is the linear operator that scales by a factor of s (see Table 7 of Section 4.9), and if g is a set in p}, then the set T{Q) (the set of images of points in Q 
under T) is called a dilation of the set ^ if £• ; 1 and a contraction of g if 0 < s < 1 (Figure 10.13.5). In either case we say that T{Q) is the set Q scaled by the factor 
s. 




Figure 10.13.5 A contraction of Q. 



The types of fractals we will consider first are called self-similar. In general, we define a self-similar set in p^^ as follows: 

n 

DEFINITION 1 

A closed and bounded subset of the Euclidean plane p} \s said to be self-similar if it can be expressed in the form 



S'=^l U^2U^3U...U^j|; (1) 
where S2, S2, ■■■,Sk ^^e nonoverlapping sets, each of which is congruent to S scaled by the same factor s (0 <s< 1 ) . 



If S is a self-similar set, then 1 is sometimes called a decomposition of S into nonoverlapping congruent sets. 

EXAMPLE 1 Line Segment M 

Aline segment in p^ (Figure 10.13.6«) can be expressed as the union of two nonoverlapping congruent line segments (Figure 10.13.6Z)). In Figure 
10.13.6Z) we have separated the two line segments slightly so that they can be seen more easily. Each of these two smaller line segments is congruent to 
the original line segment scaled by a factor of Hence, a line segment is a self-similar set with ^ = 2 and s= 



Figure 10.13.6 



EXAMPLE 2 Square < 

A square (Figure 10.13.7«) can be expressed as the union of four nonoverlapping congruent squares (Figure 10.13.7Z)), where we have again separated 
the smaller squares slightly. Each of the four smaller squares is congruent to the original square scaled by a factor of Hence, a square is a self-similar 

set with ^ = 4 and ^ = 4-- 




Figure 10.13.7 



EXAMPLES Sierpinski Carpet < 

The set suggested by Figure 10.13.8a, the Sierpinski "carpet," was first described by the Polish mathematician Waclaw Sierpinski (1882-1969). It can 
be expressed as the union of eight nonoverlapping congruent subsets (Figure 10.13.8Z)), each of which is congruent to the original set scaled by a factor 
of y . Hence, it is a self-similar set with ^ = 8 and ^ = ^- Note that the intricate square-within-a-square pattern continues forever on a smaller and 
smaller scale (although this can only be suggested in a figure such as the one shown). 




(o) (b) 
Figure 10.13.8 



EXAMPLE 4 Sierpinski Triangle M 

Figure 10.13.9a illustrates another set described by Sierpinski. It is a self-similar set with ^ = 3 and s 
carpet, the intricate triangle-within-a-triangle pattern continues forever on a smaller and smaller scale 



= ^ (Figure 10.13.9Z)). As with the Sierpinski 



Figure 10.13.9 



The Sierpinski carpet and triangle have a more intricate structure than the line segment and the square in that they exhibit a pattern that is repeated indefinitely. This 
difierence will be explored later in this section. 



Topological Dimension of a Set 

In Section 4.5 we defined the dimension of a subspace of a vector space to be the number of vectors in a basis, and we found that definition to coincide with our 
intuitive sense of dimension. For example, the origin of p} is zero-dimensional, lines through the origin are one-dimensional, and p} itself is two-dimensional. This 
definition of dimension is a special case of a more general concept called topological dimension, which is applicable to sets in that are not necessarily subspaces. 
A precise definition of this concept is studied in a branch of mathematics called topology. Although that definition is beyond the scope of this text, we can state 
informally that 

• a point inp^ has topological dimension zero; 

• a curve inp} has topological dimension one; 

• a region m p^ has topological dimension two. 

It can be proved that the topological dimension of a set in must be an integer between 0 and n, inclusive. In this text we will denote the topological dimension of i 

EXAMPLES Topological Dimensions of Sets A 

Table 1 gives the topological dimensions of the sets studied in our earlier examples. The first two results in this table are intuitively obvious; however, 
the last two are not. Informally stated, the Sierpinski carpet and triangle both contain so many "holes" that those sets resemble web-like networks of 
lines rather than regions. Hence they have topological dimension one. The proofs are quite difficult. 

Table 1 



Set S 




Line segment 


1 


Square 


*> 


Sierjiinski carpet 


1 


Sieq-)inski triimgle 


1 



Hausdorff Dimension of a Self-Similar Set 

In 1919 the German mathematician Felix Hausdorff (1868-1942) gave an alternative definition for the dimension of an arbitrary set in R^. His definition is quite 
complicated, but for a self-similar set, it reduces to something rather simple: 



DEFINITION 1 

The Hausdorff' dimension of a self-similar set S of form 1 is denoted by d}{(^ and is defined by 

b(l/5) 



In this definition, "In" denotes the natural logarithm function. Equation 2 can also be expressed as 

s^KS) _ 1 

in which the Hausdorff dimension d}{(^ appears as an exponent. Formula 3 is more helpful for interpreting the concept of Hausdorff dimension; it states, for 



^^^^ 



1 { ^ \ 

example, that.fyousca,ease™arsetbyafacto.of.= -.then,ts.^^ - . Thus, scaHng a Una 

segment by a factor of ^ reduces its measure (length) by a factor j = '2 ' scaling a square region by a factor of reduces its measure (area) by a factor of 



2 



Before proceeding to some examples, we should note a few facts about the Hausdorff dimension of a set: 

• The topological dimension and Hausdorff dimension of a set need not be the same. 

• The Hausdorff dimension of a set need not be an integer. 

• The topological dimension of a set is less than or equal to its Hausdorff dimension; that is, dT(^ <d}{{^. 

EXAMPLES Hausdorff Dimensions of Sets < 

Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. 

Table 2 



Sets 


.V 


A 


In A 

Ind/.) 


Line segment 




2 


In 2/ln 2 = 1 


Square 


I 


4 


In 4/ln2 = 2 


Sierpinski carpet 


\ 


8 


In 8/ln 3= 1.892... 


Sierpuiski triangle 


1 

- 


3 


In3/In2= 1.584... 



Fractals 



Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are equal for both the line segment and square but are unequal for the Sierpinski 
carpet and triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological and Hausdorff dimensions differ must be quite complicated (as 
Hausdorff had earlier suggested in 1919). Mandelbrot proposed calling such sets fractals, and he offered the following definition. 

r n 



DEFINITION 3 

A fractal is a subset of a Euclidean space whose Hausdorff dimension and topological dimension are not equal. 



According to thisdefinition, the Sierpinski carpet and Sierpinski triangle are fractals, whereas the line segment and square are not. 

It follows from the preceding definition that a set whose Hausdorff dimension is not an integer must be a fractal (why?). However, we will see later that the converse 
is not true; that is, it is possible for a fractal to have an integer Hausdorff dimension. 

Similitudes 

We will now show how some techniques from linear algebra can be used to generate fractals. This linear algebra approach also leads to algorithms that can be 
exploited to draw fractals on a computer. We begin with a definition. 



DEFINITION 4 

A similitude with scale factor ^ is a mapping of into of the form 

T\ 

where s, 0, e, and / are scalars. 



cos^ — sin^ 
sm& COS& 



Geometrically, a similitude is a composition of three simpler mappings: a scaling by a factor of s, a rotation about the origin through an angle 0, and a translation {e 
units in the x-direction and / units in the j-direction). Figure 10.13.10 illustrates the effect of a similitude on the unit square U. 



(0. 1) 



(1- 1) 

T 



(0.0)1 ,1.0) 

(rt) Dili I square 



(Scaling) 
s 

V 



fj (Rotation) 
_J_ 



(Translation) 



(b) Unit square 
after similitude 



Figure 10.13.10 

For our application to fractals, we will need only similitudes that are contractions, by which we mean that the scale factor s is restricted to the range 0 <. s <. 1 • 
Consequently, when we refer to similitudes we will always mean similitudes subject to this restriction. 



Similitudes are important in the study of fractals because of the following fact: 



IfT.P? —* B? <^ similitude with scale factor s and ifS is a closed and bounded set in p}, then the image T{S) of the set S under T is congruent to S scaled 
by s. 

Recall from the definition of a self-similar set in that a closed and bounded set S in is self-similar if it can be expressed in the form 

S'=^l U^2U-Sr3U...U5'ft 

where S'l , '^'3, ...,Sk nonoverlapping sets each of which is congruent to S scaled by the same factor 5 (0 < 5 < 1 ) [see 1]. In the following examples, we will 
find similitudes that produce the sets ^2, 'S'3, from S for the line segment, square, Sierpinski carpet, and Sierpinski triangle. 

EXAMPLE 7 Line Segment < 

We will take as our line segment the line segment 5* connecting the points (0, 0) and (1, 0) in thexy-plane (Figure 10.13.11a). Consider the two 
similitudes 

both of which have ^ = 9=^-^^ Figure 10. 13. HZ) we show how these two similitudes map the unit square U. The similitude T\ maps U onto 

the smaller square T\{U), and the similitude T2 maps U onto the smaller square T2(U). At the same time, Ti maps the line segment S onto the 
smaller line segment Ti (^, and T2 maps S onto the smaller nonoverlapping line segment 7^2 . The union of these two smaller nonoverlapping line 
segments is precisely the original line segment S; that is. 
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Figure 10.13.11 



EXAMPLE 8 Square A 

Let us consider the unit square U in the xy-plane (Figure 10. 13. 12«) and the following four similitudes, all having ^ = 0=^'- 



The images of the unit square U under these four similitudes are the four squares shown in Figure 10.13.12Z). Thus, 

U = Ti(U) U T2(U) U 7-3(27) U 74(27) 

is a decomposition of U into four nonoverlapping squares that are congruent to U scaled by the same scale factor = j. 
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Figure 10.13.12 



EXAMPLE 9 Sierpinski Carpet M 

Let us consider a Sierpinski carpet S over the unit square U of the xy-plane (Figure 10.13.13a) and the following eight similitudes, all having ^ = y 



9=0- 



where the eight values of 



! = 1, 2, 3, 
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The images of Sunder these eight similitudes are the eight sets shown in Figure 10.13.13Z). Thus, 

is a decomposition of into eight nonoverlapping sets that are congruent to S scaled by the same scale factor = ^ j. 
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Figure 10.13.13 



EXAMPLE 10 Sierpinski Triangle < 

Let us consider a Sierpinski triangle S fitted inside the unit square U of the xy-plane, as shown in Figure 10.13.14(3, and the following three similitudes. 



all having ^ = 9 = 0- 
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The images of 5* under these three similitudes are the three sets in Figure 10.13.14Z). Thus, 

is a decomposition of S into three nonoverlapping sets that are congruent to S scaled by the same scale factor (s = 



(11) 
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In the preceding examples we started with a specific set S and showed that it was self-similar by finding similitudes Ti, T2, T^, with the same scale factor 
such that T\ , ^2 ('^0 - ^3 ('SO , - ■ '^k were nonoverlapping sets and such that 

The following theorem addresses the converse problem of determining a self-similar set from a collection of similitudes. 



THEOREM 10.13.1 

If Ti, T2, T2, are contracting similitudes with the same scale factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane 

such that 

Furthermore, if the sets Ti , T2 , , - ., T}^ are nonoverlapping, then S is self-similar. 



Algorithms for Generating Fractals 

In general, there is no simple way to obtain the set S in the preceding theorem directly. We now describe an iterative procedure that will determine S from the 
similitudes that define it. We first give an example of the procedure and then give an algorithm for the general case. 

EXAMPLE 11 Sierpinski Carpet < 

Figure 10.13.15 shows the unit square region S'g in the xy-plane, which will serve as an "initial" set for an iterative procedure for the construction of the 
Sierpinski carpet. The set 2'i in the figure is the result of mapping 2'o with each of the eight similitudes (j = 1, 2, . . 8) in 8 that determine the 
Sierpinski carpet. It consists of eight square regions, each of side length ^, surrounding an empty middle square. Next we apply the eight similitudes to 

S'l and arrive at the set S'2. Similarly, applying the eight similitudes to S'2 results in the set S'3. It we continue this process indefinitely, the sequence of 
sets ^{,^2,^2,... will "converge" to a set S, which is the Sierpinski carpet. 




Si 54 S 

Figure 10.13.15 



Remark Although we should properly give a definition of what it means for a sequence of sets to "converge" to a given set, an intuitive interpretation will suffice in 
this introductory treatment. 

Although we started in Figure 10.13.15 with the unit square region to arrive at the Sierpinski carpet, we could have started with any nonempty set Sq. The only 
restriction is that the set Sq be closed and bounded. For example, if we start with the particular set shown in Figure 10.13.16, then Si is the set obtained by 
applying each of the eight similitudes in 8. Applying the eight similitudes to Si results in the set S2- As before, applying the eight similitudes indefinitely yields the 
Sierpinski carpet S as the limiting set. 
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Figure 10.13.16 



The general algorithm illustrated in the preceding example is as follows: Let Ti, T2, T-^,, T}^ be contracting similitudes with the same scale factor, and for an 
arbitrary set g in /^^^ define the set J{Q) by 

J{Q) = Tx(Q) U 72(0 U 73(0 U...U r;t(0 
The following algorithm generates a sequence of sets S'q, ^'j, ... that converges to the set S in Theorem 10.13.1. 

Algorithm 1 

Step 0 Choose an arbitrary nonempty closed and bounded set S'q in 
Step 1 Compute Si = J{Sq). 
Step 2 Compute ^'2 = .7(-S'i ) . 
Step 3 Compute5'3 = J'(^2)- 

Step n Compute S'„ = 

EXAMPLE 12 Sierpinski Triangle < 

Let us construct the Sierpinski triangle determined by the three similitudes given in 10. The corresponding set mapping is 

J(Q) = Ti (Q) i.J T2(Q) ' ' T2(Q)- Figure 10.13.17 shows an arbitrary closed and bounded set Sq; the first four iterates Si, S2, S^, S^; and the limiting 
set S (the Sierpinski triangle). 
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EXAMPLE 13 Using Algorithm 1 M 



Consider the following two similitudes: 



MP]) -ii ;] 



cos 0 =sin0 
sin^ cos ^ 



The actions of these two similitudes on the unit square U are illustrated in Figure 10.13.18. Here, the rotation angle 0 is a parameter that we will vary to 
generate different self-similar sets. The self-similar sets determined by these two similitudes are shown in Figure 10.13.19 for various values of 0. For 
simplicity, we have not drawn the xy-axes, but in each case the origin is the lower left point of the set. These sets were generated on a computer using 

1 



Algorithm 1 for the various values of 0. Because k = 2 and s = 



, it follows from 2 that the Hausdorff dimension of these sets for any value of 0 is 1 . It 



can be shown that the topological dimension of these sets is 1 for = Q and 0 for all other values of 0. It follows that the self-similar set for 0=()is not 
a fractal [it is the straight line segment from (0, 0) to (.6, .6)], while the self-similar sets for all other values of 0 are fractals. In particular, they are 
examples of fractals with integer Hausdorff dimension. 
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A Monte Carlo Approach 



The set-mapping approach of constructing self-similar sets described in Algorithm 1 is rather time-consuming on a computer because the similitudes involved must be 
applied to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael Bamsley described an alternative, more practical method of 
generating a self-similar set defined through its similitudes. It is a so-called Monte Carlo method that takes advantage of probability theory. Bamsley refers to it as 
the Random Iteration Algorithm. 



Let T\, T2, T-^,..., Tjj; be contracting similitudes with the same scale factor. The following algorithm generates a sequence of points 
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that collectively converge to the set S in Theorem 10. 13. 1. 
Algorithm 2 

Step 0 Choose an arbitrary point I ^ mS. 

Step 1 Choose one of the k similitudes at random, say T^^^ ^ , and compute 



Step 2 Choose one of the k similitudes at random, say compute 



Step n Choose one of the k similitudes at random, say 7;^^, and compute 

7« 



yyi-\ 



On a computer screen the pixels corresponding to the points generated by this algorithm will fill out the pixel representation of the limiting set S. 
Figure 10. 13.20 shows four stages of the Random Iteration Algorithm that generate the Sierpinski carpet, starting with the initial point ^ . 

Remark Although Step 0 in the preceding algorithm requires the selection of an initial point in the set S, which may not be known in advance, this is not a serious 
problem. In practice, one can usually start with any point mp} and after a few iterations (say ten or so), the point generated will be sufficiently close to S that the 
algorithm will work correctly from that point on. 




5000 iieratiom 1 5,000 iterations 45,000 itemtiims 100,000 iterations 

Figure 10.13.20 



More General Fractals 

So far, we have discussed fractals that are self-similar sets according to the definition of a self-similar set in p^. However, Theorem 10.13.1 remains true if the 
similitudes T\, T2, ^A; ^^e replaced by more general transformations, called contracting affine transformations. An affine transformation is defined as follows: 
r n 



DEFINITION 5 

An affine transformation is a mapping of p} into p} of the form 

T\ 

where a, b, c, d, e, and / are scalars. 



Figure 10.13.21 shows how an affine transformation maps the unit square U onto a parallelogram T( U). An affine transformation is said to be contracting if the 
Euclidean distance between any two points in the plane is strictly decreased after the two points are mapped by the transformation. It can be shown that any k 
contracting affine transformations T\, T2, T}^ determine a unique closed and bounded set S satisfying the equation 



S=Ti(^ u 72(^ u T3(^ u ... u Tk(^ 



(13) 



Equation 13 has the same form as Equation 12, which we used to find self-similar sets. Although Equation 13, which uses contracting affme transformations, does not 
determine a self-similar set S, the set it does determine has many of the features of self-similar sets. For example. Figure 10.13.22 shows how a set in the plane 
resembling a fern (an example made famous by Barnsley) can be generated through four contracting affme transformations. Note that the middle fern is the slightly 
overlapping union of the four smaller affme-image ferns surrounding it. Note also how 7^3, because the determinant of its matrix part is zero, maps the entire fern onto 
the small straight line segment between the points (.50, 0) and (.50, .16). Figure 10.13.22 contains a wealth of information and should be studied carefully. 
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Figure 10.13.22 



Michael Barnsley has applied the above theory to the field of data compression and transmission. The fern, for example, is completely determined by the four affme 
transformations Ti, T2, T^, T4. These four transformations, in turn, are determined by the 24 numbers given in Figure 10.13.22 defining their corresponding values 



of a, b, c, d, e, and / In other words, these 24 numbers completely encode the picture of the fern. Storing these 24 numbers in a computer requires considerably less 
memory space than storing a pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map on a computer screen can be described 
through a finite number of affine transformations, although it is not easy to determine which transformations to use. Nevertheless, once encoded, the affine 
transformations generally require several orders of magnitude less computer memory than a pixel-by-pixel description of the pixel map. 



Further Readings 



Readers interested in learning more about fractals are referred to the following books, the first of which elaborates on the linear transformation approach of 
this section. 

1. Michael Bamsley, Fractals Everywhere (New York: Academic Press, 1993). 

2. Benoit B. Mandelbrot, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 

3. Heinz-Otto Peitgen and P. H. Richter, The Beauty of Fractals (New York: Springer- Verlag, 1986). 

4. Heinz-Otto Peitgen and Dietmar Saupe, The Science of Fractal Images (New York: Springer- Verlag, 1988). 



Exercise Set 10.13 

1. The self-similar set in Figure Ex-1 has the sizes indicated. Given that its lower left corner is situated at the origin of the xy-plane, find the similitudes that 
determine the set. What is its Hausdorff dimension? Is it a fractal? 
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Figure Ex-1 



Answer: 





"1 0" 








0 1 







, i = 1, 2, 3, 4, where the four values of 



, and 



;^H('S0=ln(4)/b(||) = 1.888... 



2. Find the Hausdorff dimension of the self-similar set shown in Figure Ex-2. Use a ruler to measure the figure and determine an approximate value of the scale factor 
s. What are the rotation angles of the similitudes determining this set? 




Figure Ex-2 



Answer: 

s .41', ^ni^ ~ In (4) / ln(l / .47) = 1.8 ... . Rotation angles: (upper left); _90* (upper right); jgo^ (lower left); igQ^ (lower right) 
^' Each of the 12 self-similar sets in Figure Ex-3 results from three similitudes with scale factor of ^, and so all have Hausdorff dimension In 3 / In 2 = 1.584...- The 

rotation angles of the three similitudes are all multiples of 90*^- Find these rotation angles for each set and express them as a triplet of integers ^2, ^3), where 
«2 is the corresponding integer multiple of 90° in the order upper right, lower left, lower right. For example, the first set (the Sierpinski triangle) generates the 
triplet (0, 0, 0). 




Figure Ex-3 

Answer: 

(0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 
4. For each of the self-similar sets in Figure Ex-4, find: 

(i) the scale factor s of the similitudes describing the set; 

(ii) the rotation angles 0 of all similitudes describing the set (all rotation angles are multiples of 90°); and 

(iii) the Hausdorff dimension of the set. 
Which of the sets are fractals and why? 
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Figure Ex-4 



Answer: 

(a) (i) ^ = y; (ii) all rotation angles are o"; (iii) d}{(^ = ln(7) / ln(3) = 1.771 . ... This set is a fractal. 

(b) (i) s=^; (ii) all rotation angles are 180^ (iii) = ln(3) / ln(2) = 1.584 . ... This set is a fractal. 

(c) (i) ^ = 1; (ii) rotation angles: _90^ (top); IgQ^ (lower left); 180^ (lower right); (iii) d = ln(3)/ln(2) = 1.584. ... This set is a fractal. 

(d) (i) s= -^i (ii) rotation angles: 90^^ (upper left); |80^ (upper right); 180"^ (lower right); (iii) d}{(^ = ln(3) / ln(2) = 1.584 . ... This set is a fractal. 

5. Show that of the four affme transformations shown in Figure 10.13.22, only the transformation T2 is a similitude. Determine its scale factor s and rotation angle 
Answer: 



6. Find the coordinates of the tip of the fern in Figure 10.13.22. [Hint: The transformation T2 maps the tip of the fern to itself] 
Answer: 

(0.766, 0.996) rounded to three decimal places 

7. The square in Figure 10.13.7a was expressed as the union of 4 nonoverlapping squares as in Figure 10.13.7Z). Suppose that it is expressed instead as the union of 
16 nonoverlapping squares. Verify that its Hausdorff dimension is still 2, as determined by Equation 2. 

Answer: 

8. Show that the four similitudes 
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express the unit square as the union of four overlapping squares. Evaluate the right-hand side of Equation 2 for the values of k and s determined by these 
similitudes, and show that the result is not the correct value of the Hausdorff dimension of the unit square. [Note: This exercise shows the necessity of the 
nonoverlapping condition in the definition of a self-similar set and its Hausdorff dimension.] 

Answer: 

b(4)/b||j = 4.818... 

9. All of the results in this section can be extended to R^. Compute the Hausdorff dimension of the unit cube in R^ (see Figure Ex-9). Given that the topological 
dimension of the unit cube is 3, determine whether it is a fractal. [Hint: Express the unit cube as the union of eight smaller congruent nonoverlapping cubes.] 




Figure Ex-9 



Answer: 

dni^ = ln(8) / ln(2) = 3; the cube is not a fractal. 
10. The set in R^ in Figure Ex- 10 is called the Menger sponge. It is a self-similar set obtained by drilling out certain square holes from the unit cube. Note that each 
face of the Menger sponge is a Sierpinski carpet and that the holes in the Sierpinski carpet now run all the way through the Menger sponge. Determine the values 
of k and s for the Menger sponge and find its Hausdorff dimension. Is the Menger sponge a fractal? 




Figure Ex-10 



Answer: 



k = 20;s = ^;dH(^=H'^0) /ln(3) = 2.726...; the set is a fractal. 



11. The two simihtudes 
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determine a fractal known as the Cantor set. Starting with the unit square region U as an initial set, sketch the first four sets that Algorithm 1 determines. Also, 
fmd the Hausdorff dimension of the Cantor set. (This famous set was the first example that Hausdorff gave in his 1919 paper of a set whose Hausdorff dimension 
is not equal to its topological dimension.) 



Answer: 





^ ^ m ^ Second iterate 

■ ■■■ ■■■■ Thiid iterate 
Fourth iterate 

dii{^=\^{2) /ln(3) = 0.6309... 
12. Compute the areas of the sets ^'q, -S'j, and S'4 in Figure 11.13.15. 

Answer: 

Area of 2'o = 1; area of S'l = | = 0.888... ; area of S'2 = = 0.790... ; area of S'3 = \^ = 0.702... ; area of S'4 = ^| = 0.624... 

Section 10.13 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematica, Maple, Derive, or Mathcad, but it may 
also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you 
have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 



Tl. Use similitudes of the form 
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to show that the Menger sponge (see Exercise 10) is the set S satisfying 

20 

^= U Ti{S) 

2=1 

for appropriately chosen similitudes (for i = 1, 2, 3 20). Determine these similitudes by determining the collection of 3 x 1 matrices 

I hi fori=l,2,3,...,20^ 

T2. Generalize the ideas involved in the Cantor set (in p}), the Sierpinski carpet (in and the Menger sponge (in to /J" by considering the set S satisfying 
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where each flfti equals 0, or ^, and no two of them ever equal at the same time. Use a computer to construct the set 



thereby determining the value of w„ for » = 2, 3, 4. Then develop an expression for fn„. 
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10.14 Chaos 



In this section we use a map of the unit square in the xy-plane onto itself to describe the concept of a chaotic mapping. 



Prerequisites 

Geometry of Linear Operators on (Section 4.11) 

Eigenvalues and Eigenvectors 

Intuitive Understanding of Limits and Continuity 



Chaos 

The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and James Yorke in a paper entitled "Period 
Three Implies Chaos." The term is now used to describe the behavior of certain mathematical mappings and physical phenomena 
that at first glance seem to behave in a random or disorderly fashion but actually have an underlying element of order (examples 
include random-number generation, shuffling cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of 
Jupiter, and deviations in the orbit of Pluto). In this section we discuss a particular chaotic mapping c2i\\QA Arnold* s cat map, after 
the Russian mathematician Vladimir I. Arnold who first described it using a diagram of a cat. 



Arnold's Cat Map 

To describe Arnold's cat map, we need a few ideas about modular arithmetic. If x is a real number, then the notation x mod 1 
denotes the unique number in the interval [0, 1) that differs from x by an integer. For example, 

2.3 mod 1 = 0.3, 0.9 mod 1 = 0.9, - 3.7 mod 1 = 0.3, 2.0 mod 1 = 0 
Note that if x is a nonnegative number, then x mod 1 is simply the fractional part of xAf{^x,y ) is an ordered pair of real numbers, 
then the notation (x,y) niod 1 denotes (x mod \,y mod 1). For example, 

(2.3, -7.9) modi = (0.3, 0.1) 
Observe that for every real number x, the point x mod 1 lies in the unit interval [0,1) and that for every ordered pair (;:t:^ ^) , the 
point (x,y ) mod 1 lies in the unit square 

^= {(^,7)|0<:^<1,0<>;<1) 

Also observe that the upper boundary and the right-hand boundary of the square are not included in S. 



Arnold's cat map is the transformation r.R^ — ► defined by the formula 

F: (x, y) (x -^y, X -\- 2y) mod 1 

or, in matrix notation, 

/rri\ fl i1rri 

mod 1 

To understand the geometry of Arnold's cat map, it is helpful to write 1 in the factored form 
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mod 1 



which expresses Arnold's cat map as the composition of a shear in the x-direction with factor 1 , followed by a shear in the 
jF-direction with factor 1. Because the computations are performed mod 1, r maps all points of p} into the unit square S. 



We will illustrate the effect of Arnold's cat map on the unit square S, which is shaded in Figure 10.14.1a and contains a picture of 
a cat. It can be shown that it does not matter whether the mod 1 computations are carried out after each shear or at the very end. 
We will discuss both methods, first performing them at the end. The steps are as follows: 

Step 1 Shear in the x-direction with factor 1 (Figure 10.14.1Z?): 
or in matrix notation 

y 



step 2 Shear in the j^-direction with factor 1 (Figure 10.14.1c): 

(x,y) (x.x^y) 

or, in matrix notation, 

"1 0 
1 1 



X 

^y 



step 3 Reassembly into S (Figure 10.14.1(f): 



(x,y) — ^ (x,y) mod 1 



The geometric effect of the mod 1 arithmetic is to break up the parallelogram in Figure 10.14.1c and reassemble the pieces of S as 
shown in Figure 10. 14. If. 
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For computer implementation, it is more convenient to perform the mod 1 arithmetic at each step, rather than at the end. With this 
approach there is a reassembly at each step, but the net effect is the same. The steps are as follows: 

Step 1 Shear in the x-direction with factor 1, followed by a reassembly into S (Figure 10. 14.2Z?): 

{x,y) ' iTi^y.y) mod 1 

Step 2 Shear in the j^-direction with factor 1, followed by a reassembly into S (Figure 10.14.2c): 

{x,y) —> (x,x-^y) mod 1 
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Figure 10.14.2 



Repeated Mappings 



Chaotic mappings such as Arnold's cat map usually arise in physical models in which an operation is performed repeatedly. For 
example, cards are mixed by repeated shuffles, paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal 
changes, and so forth. Thus, we are interested in examining the effect on S of repeated applications (or iterations) of Arnold's cat 
map. Figure 10.14.3, which was generated on a computer, shows the effect of 25 iterations of Arnold's cat map on the cat in the 
unit square S. Two interesting phenomena occur: 

• The cat returns to its original form at the 25th iteration. 

• At some of the intermediate iterations, the cat is decomposed into streaks that seem to have a specific direction. 
Much of the remainder of this section is devoted to explaining these phenomena. 
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Figure 10.14.3 



Periodic Points 

Our first goal is to explain why the cat in Figure 10.14.3 returns to its original configuration at the 25th iteration. For this purpose 
it will be helpful to think of a picture in the xy-plane as an assignment of colors to the points in the plane. For pictures generated 
on a computer screen or other digital device, hardware limitations require that a picture be broken up into discrete squares, called 
pixels. For example, in the computer-generated pictures in Figure 10.14.3 the unit square S is divided into a grid with 101 pixels 
on a side for a total of 10,201 pixels, each of which is black or white (Figure 10.14.4). An assignment of colors to pixels to create 



a picture is called a pixel map. 




Figure 10.14.4 



As shown in Figure 10.14.5, each pixel in S can be assigned a unique pair of coordinates of the form (??2/101,«/101) that 
identifies its lower left-hand corner, where m and n are integers in the range 0, 1,2,..., 100. We call these points pixel points 
because each such point identifies a unique pixel. Instead of restricting the discussion to the case where S is subdivided into an 
array with 101 pixels on a side, let us consider the more general case where there are p pixels per side. Thus, each pixel map in S 
consists of ^ pixels uniformly spaced \ I p units apart in both the x- and the j^- directions. The pixel points in S have coordinates 
of the form {ml p,n I p), where m and n are integers ranging from 0 to ^ — 1 . 
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Figure 10.14.5 



Under Arnold's cat map each pixel point of S is transformed into another pixel point of S. To see why this is so, observe that the 
image of the pixel point {m I p,n I p) under r is given in matrix form by 



r 





' m_' 








' ?n' 








P 








r 




P 


mod 1 = 




















n_ 






1 


2_ 




n_ 


\ 


P 


1 










P 




P 



The ordered pair {{m n) I p, {m~\~ 2n) t p) mod 1 is of the form (m^ I p,n^ I p), where and lie in the range 
0, 1, 2, 1. Specifically, m' and n' are the remainders when ^ | n and m \ 2n are divided by p, respectively. 
Consequently, each point in S of the form {ml p,n I p) is mapped onto another point of the same form. 

Because Arnold's cat map transforms every pixel point of S into another pixel point of S, and because there are only p^ different 
pixel points in S, it follows that any given pixel point must return to its original position after at most p iterations of Arnold's cat 
map. 
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(verify). Because the point returns to its initial position on the ninth application of Arnold's cat map (but no sooner), 
the point is said to have period 9, and the set of nine distinct iterates of the point is called a 9-cycle. Figure 10.14.6 
shows this 9-cycle with the initial point labeled 0 and its successive iterates labeled accordingly. 




Figure 10.14.6 



In general, a point that returns to its initial position after n applications of Arnold's cat map, but does not return with fewer than n 
applications, is said to have period n, and its set of n distinct iterates is called an n-cycle. Arnold's cat map maps (0, 0) into 
(0, 0), so this point has period 1. Points with period 1 are also called fixed points. We leave it as an exercise (Exercise 11) to 
show that (0, 0) is the only fixed point of Arnold's cat map. 



Period Versus Pixel Widtfi 

If Pi and P2 are points with periods ^1 and ^2, respectively, then returns to its initial position 'mq\ iterations (but no sooner), 
and P2 returns to its initial position in ^2 iterations (but no sooner); thus, both points return to their initial positions in any number 
of iterations that is a multiple of both i^'l and '?2. In general, for a pixel map with p pixel points of the form (m / n / p),wQ\Qt 

Hip) denote the least common multiple of the periods of all the pixel points in the map [i.e., is the smallest integer that is 

divisible by all of the periods]. It follows that the pixel map will return to its initial configuration in n(/>) iterations of Arnold's 
cat map (but no sooner). For this reason, we call rL(p) the period of the pixel map. In Exercise 4 we ask you to show that if 
p = 101, then all pixel points have period 1, 5, or 25, so 11(101) = 25. This explains why the cat in Figure 10.14.3 returned to 
its initial configuration in 25 iterations. 

Figure 10.14.7 shows how the period of a pixel map varies with p. Although the general tendency is for the period to increase as p 
increases, there is a surprising amount of irregularity in the graph. Indeed, there is no simple function that specifies this 
relationship (see Exercise 1). 
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Figure 10.14.7 

Although a pixel map with p pixels on a side does not return to its initial configuration until T[{p) iterations have occurred, 
various unexpected things can occur at intermediate iterations. For example, Figure 10.14.8 shows a pixel map with p = 250 of 
the famous Hungarian- American mathematician John von Neumann. It can be shown that 11(250) =750; hence, the pixel map 
will return to its initial configuration after 750 iterations of Arnold's cat map (but no sooner). However, after 375 iterations the 
pixel map is turned upside down, and after another 375 iterations (for a total of 750) the pixel map is returned to its initial 
configuration. Moreover, there are so many pixel points with periods that divide 750 that multiple ghostlike images of the original 
likeness occur at intermediate iterations; at 195 iterations numerous miniatures of the original likeness occur in diagonal rows. 





The Tiled Plane 



Our next objective is to explain the cause of the Hnear streaks that occur in Figure 10.14.3. For this purpose it will be helpful to 
view Arnold's cat map another way. As defined, Arnold's cat map is not a linear transformation because of the mod 1 arithmetic. 
However, there is an alternative way of defining Arnold's cat map that avoids the mod 1 arithmetic and results in a linear 
transformation. For this purpose, imagine that the unit square S with its picture of the cat is a "tile," and suppose that the entire 
plane is covered with such tiles, as in Figure 10.14.9. We say that the xy-plane has been tiled with the unit square. If we apply the 
matrix transformation in 1 to the entire tiled plane without performing the mod 1 arithmetic, then it can be shown that the portion 
of the image within S will be identical to the image that we obtained using the mod 1 arithmetic (Figure 10.14.9). In short, the 
tiling results in the same pixel map in S as the mod 1 arithmetic, but in the tiled case Arnold's cat map is a linear transformation. 
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It is important to understand, however, that tiling and mod 1 arithmetic produce periodicity in different ways. If a pixel map in S 
has period n, then in the case of mod 1 arithmetic, each point returns to its original position at the end of n iterations. In the case 
of tiling, points need not return to their original positions; rather, each point is replaced by a point of the same color at the end of n 
iterations. 



Properties of Arnold's Cat Map 



To understand the cause of the streaks in Figure 10.14.3, think of Arnold's cat map as a linear transformation on the tiled plane. 
Observe that the matrix 



C = 



that defines Arnold's cat map is symmetric and has a determinant of 1 . The fact that the determinant is 1 means that multiplication 
by this matrix preserves areas; that is, the area of any figure in the plane and the area of its image are the same. This is also true 
for figures in S in the case of mod 1 arithmetic, since the effect of the mod 1 arithmetic is to cut up the figure and reassemble the 
pieces without any overlap, as shown in Figure 10.14.1<i. Thus, in Figure 10.14.3 the area of the cat (whatever it is) is the same as 
the total area of the blotches in each iteration. 



The fact that the matrix is symmetric means that its eigenvalues are real and the corresponding eigenvectors are perpendicular. We 
leave it for you to show that the eigenvalues and corresponding eigenvectors of C are 

1 

^ 1 

V2 = 



Ai = 



VI = 



1 

1.6180. 



= 0.3819..., 



1^ 



-1.6180... 
1 



For each application of Arnold's cat map, the eigenvalue Aj causes a stretching in the direction of the eigenvector vj by a factor 

of 2 . 6 1 80. . and the eigenvalue ,\2 causes a compression in the direction of the eigenvector V2 by a factor of 0. 38 1 9 Figure 

10.14.10 shows a square centered at the origin whose sides are parallel to the two eigenvector directions. Under the above 
mapping, this square is deformed into the rectangle whose sides are also parallel to the two eigenvector directions. The area of the 



square and rectangle are the same. 
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Figure 10.14.10 



To explain the cause of the streaks in Figure 10. 14.3, consider S to be part of the tiled plane, and let /? be a point of S with period 
n. Because we are considering tiling, there is a point q in the plane with the same color as p that on successive iterations moves 
toward the position initially occupied by /?, reaching that position on the nth iteration. This point is q = ~^ ) p = ^ ~"p, since 

^"q = ^«(^-"p)=p 

Thus, with successive iterations, points of S flow away from their initial positions, while at the same time other points in the plane 
(with corresponding colors) flow toward those initial positions, completing their trip on the final iteration of the cycle. Figure 

10.14.1 1 illustrates this in the case where ^ = 4, q = ^ ~ ^ j' P — — j- Note that 

p mod 1 = q mod 1 = so both points occupy the same positions on their respective tiles. The outgoing point moves in 

the general direction of the eigenvector , as indicated by the arrows in Figure 10. 14. 1 1, and the incoming point moves in the 
general direction of eigenvector V2. It is the "flow lines" in the general directions of the eigenvectors that form the streaks in 
Figure 10.14.3. 
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Nonperiodic Points 

Thus far we have considered the effect of Arnold's cat map on pixel points of the form {m I p,n I p) for an arbitrary positive 
integer p. We know that all such points are periodic. We now consider the effect of Arnold's cat map on an arbitrary point {a, b) 
in S. We classify such points as rational if the coordinates a and b are both rational numbers, and irrational if at least one of the 
coordinates is irrational. Every rational point is periodic, since it is a pixel point for a suitable choice of p. For example, the 
rational point (r\ f si, r2 f S2) can be written sls (r\S2 / s\S2, r2S\ / 51^2) , so it is a pixel point with p = s\S2.lt can be shown 
(Exercise 13) that the converse is also true: Every periodic point must be a rational point. 



It follows from the preceding discussion that the irrational points in S are nonperiodic, so that successive iterates of an irrational 
point (x{}, 70) ill *^i^ust all be distinct points in S. Figure 10.14.12, which was computer generated, shows an irrational point and 
selected iterates up to 100,000. For the particular irrational point that we selected, the iterates do not seem to cluster in any 
particular region of S; rather, they appear to be spread throughout S, becoming denser with successive iterations. 
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Figure 10.14.12 



The behavior of the iterates in Figure 10.14.12 is sufficiently important that there is some terminology associated with it. We say 
that a set D of points in S is dense in S if every circle centered at any point of S encloses points of D, no matter how small the 
radius of the circle is taken (Figure 10.14.13). It can be shown that the rational points are dense in S and the iterates of most (but 
not all) of the irrational points are dense in S. 
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Figure 10.14.13 



Definition of Cliaos 



We know that under Arnold's cat map, the rational points of S are periodic and dense in S and that some but not all of the 
irrational points have iterates that are dense in S. These are the basic ingredients of chaos. There are several definitions of chaos in 



current use, but the following one, which is an outgrowth of a definition introduced by Robert L. Devaney in 1986 in his hook An 
Introduction to Chaotic Dynamical Systems (Benjamin/Cummings Publishing Company), is most closely related to our work. 

n 

DEFINITION 1 

A mapping TofS onto itself is said to be chaotic if: 

(i) S contains a dense set of periodic points of the mapping T. 

(ii) There is a point in S whose iterates under T are dense in S. 



L J 

Thus Arnold's cat map satisfies the definition of a chaotic mapping. What is noteworthy about this definition is that a chaotic 
mapping exhibits an element of order and an element of disorder — the periodic points move regularly in cycles, but the points 
with dense iterates move irregularly, often obscuring the regularity of the periodic points. This fusion of order and disorder 
characterizes chaotic mappings. 



Dynamical Systems 

Chaotic mappings arise in the study of dynamical systems. Informally stated, a dynamical system can be viewed as a system that 
has a specific state or configuration at each point of time but that changes its state with time. Chemical systems, ecological 
systems, electrical systems, biological systems, economic systems, and so forth can be looked at in this way. In a discrete-time 
dynamical system, the state changes at discrete points of time rather than at each instant. In a discrete-time chaotic dynamical 
system, each state results from a chaotic mapping of the preceding state. For example, if one imagines that Arnold's cat map is 
applied at discrete points of time, then the pixel maps in Figure 10.14.3 can be viewed as the evolution of a discrete-time chaotic 
dynamical system from some initial set of states (each point of the cat is a single initial state) to successive sets of states. 

One of the fundamental problems in the study of dynamical systems is to predict future states of the system from a known initial 
state. In practice, however, the exact initial state is rarely known because of errors in the devices used to measure the initial state. 
It was believed at one time that if the measuring devices were sufficiently accurate and the computers used to perform the 
iteration were sufficiently powerful, then one could predict the future states of the system to any degree of accuracy. But the 
discovery of chaotic systems shattered this belief because it was found that for such systems the slightest error in measuring the 
initial state or in the computation of the iterates becomes magnified exponentially, thereby preventing an accurate prediction of 
future states. Let us demonstrate this sensitivity to initial conditions with Arnold's cat map. 

Suppose that is a point in the xy-plane whose exact coordinates are (0.77837, 0.70904) . A measurement error of 0.00001 is 
made in the j^-coordinate, such that the point is thought to be located at (0.77837, 0.70905), which we denote by go- ^^t^ Pg 
and 2o pixel points with p = 100, 000 (why?), and thus, since 0(100, 000) = 75, ODD, both return to their initial positions 
after 75,000 iterations. In Figure 10.14.14 we show the first 50 iterates of Pg under Arnold's cat map as crosses and the first 50 
iterates of as circles. Although pg 6o close enough that their symbols overlap initially, only their first eight iterates 
have overlapping symbols; from the ninth iteration on their iterates follow divergent paths. 
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Figure 10.14.14 



It is possible to quantify the growth of the error from the eigenvalues and eigenvectors of Arnold's cat map. For this purpose we 
will think of Arnold's cat map as a linear transformation on the tiled plane. Recall from Figure 10.14.10 and the related discussion 
that the projected distance between two points in S in the direction of the eigenvector vj increases by a factor of2.6180...( = Ai) 
with each iteration (Figure 10.14.15). After nine iterations this projected distance increases by a factor of 
(2.6180...)^ = 5777.99..., and with an initial error of roughly 1 / 100, 000 in the direction of vi, this distance is 0.05777..., or 

about -jL the width of the unit square S. After 12 iterations this small initial error grows to (2.6180...)^^/ 100, 000 = 1.0368..., 

which is greater than the width of S. Thus, we completely lose track of the true iterates within S after 12 iterations because of the 
exponential growth of the initial error. 




Figure 10.14.15 



Although sensitivity to initial conditions limits the ability to predict the future evolution of dynamical systems, new techniques 
are presently being investigated to describe this future evolution in alternative ways. 



Exercise Set 10.14 

1. In a journal article [F. J. Dyson and H. Falk, "Period of a Discrete Cat Mapping," The American Mathematical Monthly, 99 
(August-September 1992), pp. 603-614] the following results concerning the nature of the function n(^) were established: 

(i) n(p) = 3p if and only if = 2 • 5"^ for ^ = 1, 2, .... 

(ii) n(;?) = 2p if and only if = 5^ for ^ = 1 , 2, . . . or = 6 • 5^ for ^ = 0, 1 , 2, . . .. 

(iii) rL(p) < 12/? / 7 for all other choices of p. 

Find n(250), n(25), n(125), n(30), n(10), n(50), n(3750), n(6), and n(5). 
Answer: 

n(250) = 750, n(25) = 50, 0(125) = 250, 0(30) = 60, 0(10) = 30, 0(50) = 150, 0(3750) = 7500, 0(6) = 12, 
O(5) = 10 

2. Find all the n-cycles that are subsets of the 36 points in S of the form (m f 6, n / 6) with m and n in the range 0, 1, 2, 3, 4, 5. 
Thenfmdn(6). 

Answer: 

One 1-cycle: ((0, 0)) ; one 3-cyde: {(|. o], (|. |). (o, |] } ; two 4.cycle,: {(f . o], (|. |). (|. o], (|, |]} .nd 

}■ I)- (i- !)■ (I- !)■ (I- i)- (I- !)• (i- 1} («■ f )■ (I- !)■ (I- 1} (I- !)• (I- !)• (f • 1)} - 



{(i- ih 1} (!■ I)- (f • I)- (i- !)• (!• 1} (f • »)• (f f )• (!• !)• (i^ !)• (I- 1)- (I- f )}■ - 

3. (Fibonacci Shift-Register Random-Number Generator) A well-known method of generating a sequence of "pseudorandom" 
integers ttq, ;t i, ;V2, ;^3, - - - in the interval from 0 to p — 1 is based on the following algorithm: 

(i) Pick any two integers o and i from the range 0,1,2,...,/? — !. 

(ii) Set Xyi^i = (xyi -h Xyi-i) mod p for n = \, 2, .... 

Here x mod p denotes the number in the interval from 0 to /? — 1 that differs from x by a multiple of p. For example, 35 mod 
9 = 8 (because 8 = 35 - 3 • 9); 36 mod 9 = 0 (because 0 = 36-4-9); and _3 mod 9 = 6 (because 6 = - 3 I 1-9). 

(a) Generate the sequence of pseudorandom numbers that results from the choices p = 15, ;to = 3, and = 7 until the 
sequence starts repeating. 

(b) Show that the following formula is equivalent to step (ii) of the algorithm: 

modp for « = 1, 2, 3, ... 

(c) Use the formula in part (b) to generate the sequence of vectors for the choices p = 2\, X{) = 5, and xi = 5 until the 
sequence starts repeating. 

Answer: 
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(a) 3, 7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, 
(c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5),... 

Remark If we take p = \ and pick X{} and x \ from the interval [0, 1 ) , then the above random-number generator produces 
pseudorandom numbers in the interval [0, 1). The resulting scheme is precisely Arnold's ct map. Furthermore, if we eliminate 
the modular arithmetic in the algorithm and take x{) = xi = 1, then the resulting sequence of integers is the famous Fibonacci 
sequence, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, in which each number after the first two is the sum of the preceding two 
numbers. 



4. 



ForC: 



1 1 

1 2 



, it can be verified that 



7,778,742,049 12,586,269,025 
12,586,269,025 20,365,011,074 



It can also be verified that 12,586,269,025 is divisible by 101 and that when 7,778,742,049 and 20,365,011,074 are divided by 
101, the remainder is 1. 

(a) Show that every point in S of the form (w/101,«/101) returns to its starting position after 25 iterations under Arnold's 
cat map. 

(b) Show that every point in S of the form (w/101,«/101) has period 1 , 5, or 25. 
(^) Show that the point ^ j has period greater than 5 by iterating it five times, 
(d) Show that n(lOl) = 25. 



Answer: 

(c) 

The first five iterates of ^ , 0 j 



are 



1 



1 



101 ' 101 



101 ' 101 



3 \ f_5 8_~\ (J3_ _2]_\ , (34_ _55_ 

101 )' \ 101 ' 101 j' \ 101 ' 101 )' \\0\ ' 101 



^- Show that for the mapping X.S -S defined by T{x, y) = (x 4 -j^, j j mod 1 , every point in 5' is a periodic point. Why does 

this show that the mapping is not chaotic? 
6. An Anosov automorphism aap} \s& mapping from the unit square S onto S of the form 



a b 
d 



X 

y 



mod 1 



in which (i) a, b, c, and d are integers, (ii) the determinant of the matrix is | 1 , and (iii) the eigenvalues of the matrix do not 
have magnitude 1. It can be shown that all Anosov automorphisms are chaotic mappings. 

(a) Show that Arnold's cat map is an Anosov automorphism. 

(b) Which of the following are the matrices of an Anosov automorphism? 
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(c) Show that the following mapping of S onto S is not an Anosov automorphism. 

'x' 



y 



mod 1 



What is the geometric effect of this transformation on S7 Use your observation to show that the mapping is not a chaotic 
mapping by showing that all points in S are periodic points. 



Answer 
(b) 



The matrices of Anosov automorphisms are 



3 2 
1 1 



and 



5 7 
2 3 



(c) The transformation affects a rotation of S through 90 in the clockwise direction. 

7. Show that Arnold's cat map is one-to-one over the unit square S and that its range is S. 

8. Show that the inverse of Arnold's cat map is given by 

(a-, y) = (2x'-y, -x 4 y) mod 1 

9. Show that the unit square S can be partitioned into four triangular regions on each of which Arnold's cat map is a 
transformation of the form 









'a' 




y 




_b_ 



where a and b need not be the same for each region. [Hint: Find the regions in S that map onto the four shaded regions of the 
parallelogram in Figure 10. 14.1 J.] 
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II:[;] = [j];inregionIII:[^] = [_; 



; in region IV: 



; m region J 

10. If (;5;q^ ^q) is a point in S and (a„, 7„) is its ^th iterate under Arnold's cat map, show that 

mod 1 
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This result implies that the modular arithmetic need only be performed once rather than after each iteration. 
11. Show that (0, 0) is the only fixed point of Arnold's cat map by showing that the only solution of the equation 

mod 1 



"1 r 




1 2_ 


/0_ 



with 0 < 7:0 < 1 and 0<y{}< 1 is;:^-Q=jQ = 0. [H^int: For appropriate nonnegative integers, r and s, we can write 

70 





'1 ll' 


]= 


1 2j_ 



for the preceding equation.] 
12. Find all 2-cycles of Arnold's cat map by finding all solutions of the equation 



[;:]= 



1 1 
1 2 



^0 
70 



mod 1 



with 0 < 7:0 < 1 and 0<y{] < 1 • [Hint: For appropriate nonnegative integers, r and s, we can write 



r^oi^p 3ir^0l_rr"| 



for the preceding equation.] 
Answer: 

j and j form one 2-cycle, and ^ j and ^ j form another 2-cycle. 

13. Show that every periodic point of Arnold's cat map must be a rational point by showing that for all solutions of the equation 

the numbers ^0 and yg are quotients of integers. 

14. Let rbe the Arnold's cat map applied five times in a row; that is, T=Y^- Figure Ex- 14 represents four successive mappings 
of T on the first image, each image having a resolution of 1 0 1 x 1 0 1 pixels. The fifth mapping returns to the first image 
because this cat map has a period of 25. Explain how you might generate this particular sequence of images. 



Figure Ex-14 



Answer: 

Begin with alQlxlOl array of white pixels and add the letter 'A in black pixels to it. Apply the mapping to this image, 
which will scatter the black pixels throughout the image. Then superimpose the letter 'B' in black pixels onto this image. 
Apply the mapping again and then superimpose the letter 'C in black pixels onto the resulting image. Repeat this procedure 
with the letters 'D' and 'E'. The next application of the mapping will return you to the letter 'A with the pixels for the letters 
'B' through 'E' scattered in the background. 

Section 10.14 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 



integer satisfying the equation 



This suggests that one way to determine H{p) is to compute 



1 1 

1 2 



modp 



starting with = 1 and stopping when this produces the identity matrix. Use this idea to compute Hip) for = 2, 
Compare your results to the formulas given in Exercise 1, if they apply. What can you conjecture about 



1 



modp 



when HQ?) is even? 

T2. The eigenvalues and eigenvectors for the cat map matrix 

C 





3 + 1/5 
2 • 




-1^ 

2 ' 




1 




1 


VI = 


1 + /? 




1-/5 




2 




2 



Using these eigenvalues and eigenvectors, we can define 



D = 



3 + ^5 



and P = 



1 1 
1 + 1/5 1-1/5 



and write Q = PDP ~' ; hence, C" = PE^ P ~' • Use a computer to show that 

C" = 

where 











(Yi) 




'=21 


<^22 



2|/5 



2/5 



and 



How can you use these results and your conclusions in Exercise Tl to simplify the method for computing [!(/>)? 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.15 Cryptography 

In this section we present a method of encoding and decoding messages. We also examine modular arithmetic and show 
how Gaussian elimination can sometimes be used to break an opponent's code. 

'rl 

Prerequisites 

Matrices 

Gaussian Elimination 

Matrix Operations 

Linear Independence 

Linear Transformations (Section 4.9) 

Q □ 



Ciphers 

The study of encoding and decoding secret messages is called cryptography. Although secret codes date to the earliest days 
of written communication, there has been a recent surge of interest in the subject because of the need to maintain the 
privacy of information transmitted over public lines of communication. In the language of cryptography, codes are called 
ciphers, uncoded messages are called plaintext, and coded messages are called ciphertext. The process of converting from 
plaintext to ciphertext is called enciphering, and the reverse process of converting from ciphertext to plaintext is called 
deciphering. 

The simplest ciphers, called substitution ciphers, are those that replace each letter of the alphabet by a different letter. For 
example, in the substitution cipher 

Plain ABCDEFGHI J KLMNOPQRS TUVWXYZ 
Cipher DEFGHI JKLMNOPQRSTUVWXYZABC 

the plaintext letter A is replaced by D, the plaintext letter B by E, and so forth. With this cipher the plaintext message 

ROME WAS NOT BUILT IN A DAY 

becomes 

URPH ZDV QRWBXLOWLQ D GDB 



Hill Ciphers 

A disadvantage of substitution ciphers is that they preserve the frequencies of individual letters, making it relatively easy to 
break the code by statistical methods. One way to overcome this problem is to divide the plaintext into groups of letters and 
encipher the plaintext group by group, rather than one letter at a time. A system of cryptography in which the plaintext is 
divided into sets of n letters, each of which is replaced by a set of n cipher letters, is called a polygraphic system. In this 
section we will study a class of polygraphic systems based on matrix transformations. [The ciphers that we will discuss are 
called /^iV/ ciphers after Lester S. Hill, who introduced them in two papers: "Cryptography in an Algebraic Alphabet," 
American Mathematical Monthly, 36 (June- July 1929), pp. 306-312; and "Concerning Certain Linear Transformation 
Apparatus of Cryptography," ^menca^ Mathematical Monthly, 38 (March 1931), pp. 135-154.] 



In the discussion to follow, we assume that each plaintext and ciphertext letter except Z is assigned the numerical value that 
specifies its position in the standard alphabet (Table 1). For reasons that will become clear later, Z is assigned a value of 
zero. 

Table 1 



A 


B 


C 


D 


E 


F 


G 


H 


/ 


J 


K 


L M 


iV O 


P Q 


R 


S 


T V 


V w 






z 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 13 


14 LS 


16 17 


18 


19 


20 21 




24 


25 


0 



In the simplest Hill ciphers, successive pairs of plaintext are transformed into ciphertext by the following procedure: 
Step 1 Choose a 2 x 2 matrix with integer entries 




to perform the encoding. Certain additional conditions on A will be imposed later. 

Step 2 Group successive plaintext letters into pairs, adding an arbitrary "dummy" letter to fill out the last pair if the 
plaintext has an odd number of letters, and replace each plaintext letter by its numerical value. 

Step 3 Successively convert each plaintext pair P\P2 into a column vector 




and form the product ^p. We will call p a plaintext vector and ^p the corresponding ciphertext vector. 
Step 4 Convert each ciphertext vector into its alphabetic equivalent. 

EXAMPLE 1 Hill Cipher of a Message < 

Use the matrix 

"1 2 
0 3_ 

to obtain the Hill cipher for the plaintext message 

I AM HIDING 

Solution If we group the plaintext into pairs and add the dummy letter G to fill out the last pair, we obtain 

lA MH ID IN GG 

or, equivalently, from Table 1 , 

91 13 8 94 9 14 77 

To encipher the pair lA, we form the matrix product 



'1 2' 


"9" 




'ir 


0 3_ 


_1_ 




3_ 



which, from Table 1, yields the ciphertext KC. 
To encipher the pair MH, we form the product 



"1 2" 


"13" 




"29" 


0 3_ 


_ 8_ 




24_ 



However, there is a problem here, because the number 29 has no alphabet equivalent (Table 1). To resolve 
this problem, we make the following agreement: 



Whenever an integer greater than 25 occurs, it will be 
replaced by the remainder that results when this 
integer is divided by 26 . 

Because the remainder after division by 26 is one of the integers 0, 1, 2, 25, this procedure will always 
yield an integer with an alphabet equivalent. 

Thus, in 1 we replace 29 by 3, which is the remainder after dividing 29 by 26. It now follows from Table 1 
that the ciphertext for the pair MH is CX. 

The computations for the remaining ciphertext vectors are 



[J 3] 

ri 2 

[0 3 

These correspond to the ciphertext pairs QL, KP, and UU, respectively. In summary, the entire ciphertext 
message is 

KC CX QL KP UU 
which would usually be transmitted as a single string without spaces: 

KCCXQLKPUU 



9 
4 

9 
14 

"7 
7 





'17' 




12 




"37" 




42 




"21" 




21 



or 



11 
16 



Because the plaintext was grouped in pairs and enciphered by a 2 x 2 matrix, the Hill cipher in Example 1 is referred to as a 
Hill 2-cipher. It is obviously also possible to group the plaintext in triples and encipher by a 3 x 3 matrix with integer 
entries; this is called a Hill 3-cipher. In general, for a Hill n-cipher, plaintext is grouped into sets of n letters and 
enciphered by an ^ x « matrix with integer entries. 



Modular Arithmetic 



In Example 1, integers greater than 25 were replaced by their remainders after division by 26. This technique of working 
with remainders is at the core of a body of mathematics called modular arithmetic. Because of its importance in 
cryptography, we will digress for a moment to touch on some of the main ideas in this area. 

In modular arithmetic we are given a positive integer m, called the modulus, and any two integers whose difference is an 
integer multiple of the modulus are regarded as "equal" or "equivalent" with respect to the modulus. More precisely, we 
make the following definition. 

r n 



DEFINITION 1 



If m is a positive integer and a and b are any integers, then we say that a is equivalent to b modulo m, written 

a = b (mod m) 

if ^ __ ^ is an integer multiple of m. 



EXAMPLE 2 Various Equivalences M 



7 = 


2 


(mod 5) 


19 = 


3 


(mod 2) 




25 


(mod 26) 


12 = 


0 


(mod 4) 



For any modulus m it can be proved that every integer a is equivalent, modulo m, to exactly one of the integers 

0, 1,2,...,^-! 

We call this integer the residue of a modulo m, and we write 

Zm= {0,1,2 ^-1} 

to denote the set of residues modulo m. 

If a is a nonnegative integer, then its residue modulo m is simply the remainder that results when a is divided by m. For 
arbitrary integer a, the residue can be found using the following theorem. 



THEOREM 10.15.1 

For any integer a and modulus m, let 



R = remainder of 

^ m 

Then the residue r of a modulo m is given by 

[R ifa>0 
r=}m~R ifa<0 and R^O 
0 ifa<0 and R = 0 



EXAMPLES Residues mod 26 M 

Find the residue modulo 26 of (a) 87, (b) _38, and (c) _26- 
Solution 

(a) Dividing |87| = 87 by 26 yields a remainder of = 9, so ^ = 9. Thus, 

87 = 9 (mod 26) 

(b) Dividing | - 38 1 = 38 by 26 yields a remainder of = 12, so r = 26 - 12 = 14- Thus, 

-38 = 14 (mod 26) 

(c) Dividing | — 26 1 = 26 by 26 yields a remainder ofR=0- Thus, 

-26 = 0 (mod 26) 



In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative inverse, denoted by ^ ^ , such that 



In modular arithmetic we have the following corresponding concept: 

r 

DEFINITION 2 

If a is a number in Z^, then a number in is called a reciprocal or multiplicative inverse of a modulo m if 
aa~^ =a~^a = l(mod w). 

L 

It can be proved that if a and m have no common prime factors, then a has a unique reciprocal modulo m; conversely, if a 
and m have a common prime factor, then a has no reciprocal modulo m. 

EXAMPLE 4 Reciprocal of 3 mod 26 < 

The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime factors. This reciprocal 
can be obtained by finding the number x in Z26 that satisfies the modular equation 

3x = 1 (mod 26) 

Although there are general methods for solving such modular equations, it would take us too far afield to 
study them. However, because 26 is relatively small, this equation can be solved by trying the possible 
solutions, 0 to 25, one at a time. With this approach we find that x = 9 the solution, because 

3 . 9 = 27 = 1 (mod 26) 

Thus, 

3"^ = 9 (mod 26) 
EXAMPLE 5 A Number with No Reciprocal mod 26 M 

The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime factor (see Exercise 
8). 

For future reference, in Table 2 we provide the following reciprocals modulo 26: 

Table 2 Reciprocals Modulo 26 



(I 


1 


3 


5 




9 


11 


15 


17 


19 


21 


23 


25 


a' 


1 


9 


21 


15 


3 


19 


7 


23 


11 


5 


17 


25 



Deciphering 

Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, decipherment uses the inverse 
(mod 26) of the enciphering matrix. To be precise, if m is a positive integer, then a square matrix A with entries in is 
said to be invertible modulo m if there is a matrix B with entries in Z>t, such that 



AB = BA = I (modw) 

Suppose now that 

" [^21 ^22 

is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If 

[PI 



P = 

is a plaintext vector, then 



P2 



(1) 



c = -4p (mod 26) 

is the corresponding ciphertext vector and 

p = ^~^c (mod 26) 

Thus, each plaintext vector can be recovered from the corresponding ciphertext vector by multiplying it on the left by 
(mod 26). 

In cryptography it is important to know which matrices are invertible modulo 26 and how to obtain their inverses. We now 
investigate these questions. 

In ordinary arithmetic, a square matrix A is invertible if and only if det(^) ^0,or, equivalently, if and only if det(^) has a 
reciprocal. The following theorem is the analog of this result in modular arithmetic. 



THEOREM 10.15.2 

A square matrix A with entries in is invertible modulo m if and only if the residue of det(^) modulo m has a 
reciprocal modulo m. 



Because the residue of det(^) modulo m will have a reciprocal modulo m if and only if this residue and m have no common 
prime factors, we have the following corollary. 



COROLLARY 10.15.3 

A square matrix A with entries in is invertible modulo m if and only if m and the residue of det(^) modulo m 
have no common prime factors. 



Because the only prime factors of ^ = 26 ^re 2 and 13, we have the following corollary, which is useful in cryptography. 
COROLLARY 10.15.4 

A square matrix A with entries in Z2fS is invertible modulo 26 if and only if the residue of det(^4) modulo 26 is not 
divisible by 2 or 13. 



We leave it for you to verify that if 



A = 



a b 

c d 



has entries in Z^^ and the residue of det(^) =ad — be modulo 26 is not divisible by 2 or 13, then the inverse of A (mod 
26) is given by 



= {ad-bc)-^ 



d -b 
—c a 



(mod 26) 



where {ad — icr) ^ is the reciprocal of the residue of ad — be (mod 26). 

EXAMPLE 6 Inverse of a Matrix mod 26 < 

Find the inverse of 



A = 



5 6 
2 3 



modulo 26. 
Solution 

so from Table 2, 
Thus, from 2, 

As a check. 

Similarly, ^ = /. 



det(^) =ad~bc = 5'3-6'2 = 3 



(ad - ic) = 3 = 9 (mod 26) 



AA^^ = 



3 


-6" 




27 


-54" 




"1 24" 


_-2 


5_ 




-18 


45_ 




8 19_ 



"5 6" 


"1 24" 




"53 234" 




"1 0" 


2 3_ 


8 19_ 




26 105_ 




0 1_ 



(mod 26) 



(mod 26) 



(2) 



EXAMPLE 7 Decoding a Hill 2-Cipher < 

Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6: 

GTNKGKDUSK 



Solution From Table 1 the numerical equivalent of this ciphertext is 

7 20 14 11 7 11 4 21 19 11 
To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A (obtained in Example 6): 



'l 


24 


7" 




"487" 




19" 


8 


19 


20_ 




436 


=■ 


_20_ 


"l 


24' 


'14' 




"278" 




'18' 


8 


19 


11 




_321_ 




9_ 


"l 


24" 


7' 




271" 




"n" 


8 


19_ 


11 


— 


265 




_ 5_ 


"l 


24" 


4' 




508' 




14' 


8 


19_ 


21 




431 




_15_ 


"1 


24" 


'19' 




'283' 




'23' 


8 


19_ 


_11_ 




_361_ 




_23_ 



(mod 26) 
(mod 26) 
(mod 26) 
(mod 26) 
(mod 26) 



From Table 1 , the alphabet equivalents of these vectors are 

ST RI KE NO WW 

which yields the message 

STRIKE NOW 



Breaking a Hill Cipher 

Because the purpose of enciphering messages and information is to prevent "opponents" from learning their contents, 
cryptographers are concerned with the security of their ciphers — that is, how readily they can be broken (deciphered by 
their opponents). We will conclude this section by discussing one technique for breaking Hill ciphers. 

Suppose that you are able to obtain some corresponding plaintext and ciphertext from an opponent's message. For example, 
on examining some intercepted ciphertext, you may be able to deduce that the message is a letter that begins DEAR SIR. We 
will show that with a small amount of such data, it may be possible to determine the deciphering matrix of a Hill code and 
consequently obtain access to the rest of the message. 

It is a basic result in linear algebra that a linear transformation is completely determined by its values at a basis. This 
principle suggests that if we have a Hill ^-cipher, and if 

are linearly independent plaintext vectors whose corresponding ciphertext vectors 

are known, then there is enough information available to determine the matrix A and hence i4~^ (mod m) . 
The following theorem, whose proof is discussed in the exercises, provides a way to do this. 



THEOREM 10.15.5 Determining the Deciphering Matrix 



Let p 1 , P2. - - Pw be linearly independent plaintext vectors, and let c i , C2, . - c„ be the corresponding ciphertext 
vectors in a Hill ^-cipher. If 



is the B X « matrix with row vectors and if 



P = 



Pi 
P2 



P« 



c= 



is the ^ X « matrix with row vectors Cp C2', . - cj, then the sequence of elementary row operations that reduces C 
to / transforms ^ to ~^ ) . 



This theorem tells us that to fmd the transpose of the deciphering matrix j[ ^ , we must fmd a sequence of row operations 
that reduces C to / and then perform this same sequence of operations on P. The following example illustrates a simple 
algorithm for doing this. 

EXAMPLE 8 Using Theorem 10.15.5 M 

The following Hill 2-cipher is intercepted: 

lOSBTGXESPXHOPDE 
Decipher the message, given that it starts with the word DEAR. 

Solution From Table 1, the numerical equivalent of the known plaintext is 

DE AR 
4 5 1 18 

and the numerical equivalent of the corresponding ciphertext is 

10 SB 
9 15 19 2 
so the corresponding plaintext and ciphertext vectors are 





"4" 




" 9" 


PI = 


_5_ 


CJ = 


_15_ 





r 




'19" 


P2 = 


18 


•: > C2 = 


2 



We want to reduce 



C = 



9 15 
19 2 



to / by elementary row operations and simultaneously apply these operations to 



P = 



Pi 

P2 



4 5 
1 18 



4 5 
1 18j 

12 15] 

1 isj 



2 

to obtain (j4~^) (the transpose of the deciphering matrix). This can be accompHshed by adjoining P to the 

right of C and applying row operations to the resulting matrix [Cj-P] until the left side is reduced to /. The 

7 

final matrix will then have the form [I {A"^) ] . The computations can be carried out as follows: 

4- We formed the matrix [C\P] . 
4— "We multiplied the first row by 9 ~^ = 3 . 
4— We replaced 45 by its residue modulo 26 . 

We added — 19 times the first row to the second . 
4— We replaced the entries in the second row by their residues modulo 26 . 
4— We multipfied the second row by 5"^ = 21. 

4— We replaced Ihe entries in llie second row hj their residues modulo 26 . 



9 15 
19 2 

1 45 
19 2 

1 19 
19 2 

1 19 

0 -359 

1 19 

0 5 

1 19 

0 1 

1 19 

0 1 

1 0 

0 1 

1 0 
0 1 

Thus, 



12 15 
1 18 

12 15 
-227 -267 

12 15 

7 19 

12 15 
147 399 

12 15 
17 9 

-311 -156 
17 9 

1 0 
17 9 



■ We added — 19 times tiie second row to lint first 



■ We replaced the entries in the first row by their residues modulo 26 . 



so the deciphering matrix is 



To decipher the message, we first group the ciphertext into pairs and find the numerical equivalent of each 
letter: 

10 SB TG XE SP XH OP DE 
9 15 19 2 20 7 24 5 19 16 24 8 15 16 4 5 

Next, we multiply successive ciphertext vectors on the left by and fmd the alphabet equivalents of the 

resulting plaintext pairs: 



"1 

1 


17' 


g" 


0 


9 


15 


" 1 

1 


17" 


" 1 q' 


0 


9 


2 


" 1 

1 


17" 




0 


9_ 


_ 7_ 


" 1 

1 


17' 


''?4' 


0 


9 


5 


" 1 

1 


17' 


" 1 q" 


0 


9 


16 


" -I 

1 


1 / 


2A 


0 


9_ 


8 


"1 


17" 


'15' 


0 


9 


16 



4 

5 



17 
9 



D 
E 

A 
R 

I 
K 

E 

S 

B 
N 

D 
T 

A 

N 

K 
S 



(mod 26) 



Finally, we construct the message from the plaintext pairs: 

DE AR IK ES EN DT AN KS 
DEAR IKE SEND TANKS 



Further Readings 

Readers interested in learning more about mathematical cryptography are referred to the following books, the first 
of which is elementary and the second more advanced. 

1 . Abraham Sinkov, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Association of America, 2009). 

2. Alan G. Konheim, Cryptography, a Primer (New York: Wiley-Interscience, 1981). 



Exercise Set 10.15 



1. Obtain the Hill cipher of the message 

for each of the following enciphering matrices: 

(a) ri 3" 
_2 1_ 

(b) "4 3" 
1 2 



DARK NIGHT 



Answer: 



(a) GIYUOKEVBH 

(b) SFANEFZWJH 



2. In each part determine whether the matrix is invertible modulo 26. If so, find its inverse modulo 26 and check your work 
by verifying that ^ ~^ =zA = / (mod 26). 



(b)^ 
(d)^ 

(f) A 



4; 3] 



Answer: 
(a) A- 



,J12 71 
[23 I5J 



(b) Not invertible 

(c) .-i_r 1 

^ -[23 24J 

(d) Not invertible 

(e) Not invertible 



15 12 
21 5 



3. Decode the message 



SAKNOXAOJX 



given that it is a Hill cipher with enciphering matrix 



[3^] 



Answer: 

WE LOVE MATH 
4. A Hill 2-cipher is intercepted that starts with the pairs 

SLHK 

Find the deciphering and enciphering matrices, given that the plaintext is known to start with the word ARMY. 



Answer: 

Deciphering matrix = 



7 15 

6 5 



; encipheriiig matHx = 



7 5 

2 15 



5. Decode the following Hill 2-cipher if the last four plaintext letters are known to be ATOM. 



LNQIHQYBVRBNJYQO 



Answer: 

THEY SPLIT THE ATOM 

6. Decode the following Hill 3 -cipher if the first nine plaintext letters are IHAVECOME: 

HPAFQOODUODDHPOODYNOR 

Answer: 

I HAVE COME TO BURY CAESAR 

7. All of the results of this section can be generalized to the case where the plaintext is a binary message; that is, it is a 
sequence of O's and I's. In this case we do all of our modular arithmetic using modulus 2 rather than modulus 26. Thus, 
for example, 1 4- 1 = 0 (mod 2). Suppose we want to encrypt the message 110101111. Let us first break it into triplets to 

~1 1 0' 

0 1 1 

1 1 1 



form the three vectors 



, and let us take 



as our enciphering matrix. 



(a) Find the encoded message. 

(b) Find the inverse modulo 2 of the enciphering matrix, and verify that it decodes your encoded message. 
Answer: 



(a) 010110001 



(b) 



0 1 1 

1 1 1 
1 0 1 



8. If, in addition to the standard alphabet, a period, comma, and question mark were allowed, then 29 plaintext and 
ciphertext symbols would be available and all matrix arithmetic would be done modulo 29. Under what conditions 
would a matrix with entries in Z29 be invertible modulo 29? 

Answer: 

A is invertible modulo 29 if and only if det{A) ^ 0 (mod 29). 

9. Show that the modular equation Ax=l (mod 26) has no solution in Z26 by successively substituting the values 
X = 0, 1, 2 25. 

in T 

• (a) Let P and Cbe the matrices in Theorem 10.15.5. Show that p = C7(j4~^) • 

(b) To prove Theorem 10.15.5, let Si, £2* be the elementary matrices that correspond to the row operations that 

reduce C to /, so 

Bn..£2^lC = I 

Show that 



from which it follows that the same sequence of row operations that reduces C to / converts Pto (A ) 
(a) If A is the enciphering matrix of a Hill ^-cipher, show that 



A'^^iC-'Py (mod 26) 
where C and P are the matrices defined in Theorem 10.15.5. 



-1; 



(b) Instead of using Theorem 10.15.5 as in the text, find the deciphering matrix of Example 8 by using the result in 
part (a) and Equation 2 to compute C ~^ • [Note: Although this method is practical for Hill 2-ciphers, Theorem 
10.15.5 is more efficient for Hill ^-ciphers with « > 2-] 

Section 10,15 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be matlab, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with 
some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular 
utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. 
Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of 
the problems in the regular exercise sets. 

Tl. Two integers that have no common factors (except 1) are said to be relatively prime. Given a positive integer n, let 
Sy2= {a\, a2, a^} , where c3f i <a2<(^3<-..<ci^, be the set of all positive integers less than n and relatively 

prime to n. For example, if ^ = 9, then 

Sg = {aua2.a2,...,a^) = (1,2,4,5,7,8) 

(a) Construct a table consisting of n and Sy^ for « = 2, 3, 15, and then compute 



and 



(mod «) 



in each case. Draw a conjecture for ^ -> 15 and prove your conjecture to be true. [Hint: Use the fact that if a is 
relatively prime to n, then « — (3 is also relatively prime to ^.] 



Given a positive integer n and the set 


let 


P„ be the ^ 


xm 


matrix 








ai 




a2 


... a^-i 










a2 


a4 


... am 


^1 






a2 


a4 


^5 


... ai 


^2 










ai 


... ayn-3 


^wi-2 










^2 


... ayn-2 





so that, for example. 



1 


2 


4 


5 


7 


8 


2 


4 


5 


7 


8 


1 


A 


5 


7 


8 


1 


2 


5 


7 


8 


1 


2 


4 


7 


8 


1 


2 


4 


5 


8 


1 


2 


4 


5 


7 



P9 = 



Use a computer to compute det(P„) and det(P„) (mod n) for « = 2, 3, 15, and then use these results to construct a 
conjecture. 

(c) Use the results of part (a) to prove your conjecture to be true. [Hint: Add the first m—\ rows of F„ to its last row and 
then use Theorem 2.2.3.] What do these results imply about the inverse of P„(mod n)l 

T2. Given a positive integer n greater than 1, the number of positive integers less than n and relatively prime to n is called 
the Euler phi function of n and is denoted by s{n) . For example, ^(6) = 2 since only two positive integers (1 and 5) are 
less than 6 and have no common factor with 6. 

(a) Using a computer, for each value of « = 2, 3, . . 25 compute and print out all positive integers that are less than n and 
relatively prime to n. Then use these integers to determine the values oic{n) for « = 2, 3, . . ., 25. Can you discover a 
pattern in the results? 



(b) It can be shown that if {pi, P2, Pm) are all the distinct prime factors of n, then 

^«="C-i)('-i)(>-A)---('-ii^) 

For example, since {2, 3} are the distinct prime factors of 12, we have 

ip(12) = 12(l-lj(l-l] = 4 

which agrees with the fact that { 1 , 5, 7, 11) are the only positive integers less than 12 and relatively prime to 12. 
Using a computer, print out all the prime factors of nfoYn = 2, 3, 25. Then compute ^{?2) using the formula above 
and compare it to your results in part (a). 
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10.16 Genetics 



In this section we investigate the propagation of an inherited trait in successive generations by computing 
powers of a matrix. 

□ 

Prerequisites 

Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 



Inheritance Traits 

In this section we examine the inheritance of traits in animals or plants. The inherited trait under consideration 
is assumed to be governed by a set of two genes, which we designate by A and a. Under autosomal 
inheritance each individual in the population of either gender possesses two of these genes, the possible 
pairings being designated AA, Aa, and aa. This pair of genes is called the individual's genotype, and it 
determines how the trait controlled by the genes is manifested in the individual. For example, in snapdragons 
a set of two genes determines the color of the flower. Genotype AA produces red flowers, genotype Aa 
produces pink flowers, and genotype aa produces white flowers. In humans, eye coloration is controlled 
through autosomal inheritance. Genotypes AA and aa have brown eyes, and genotype Aa has blue eyes. In this 
case we say that gene A dominates gene a, or that gene a is recessive to gene A, because genotype Aa has the 
same outward trait as genotype AA. 

In addition to autosomal inheritance we will also discuss X-linked inheritance. In this type of inheritance, the 
male of the species possesses only one of the two possible genes {A or a), and the female possesses a pair of 
the two genes {AA, aa, or Aa). In humans, color blindness, hereditary baldness, hemophilia, and muscular 
dystrophy, to name a few, are traits controlled by X-linked inheritance. 

Below we explain the manner in which the genes of the parents are passed on to their offspring for the two 
types of inheritance. We construct matrix models that give the probable genotypes of the offspring in terms of 
the genotypes of the parents, and we use these matrix models to follow the genotype distribution of a 
population through successive generations. 

Autosomal Inheritance 

In autosomal inheritance an individual inherits one gene from each of its parents' pairs of genes to form its 
own particular pair. As far as we know, it is a matter of chance which of the two genes a parent passes on to 
the offspring. Thus, if one parent is of genotype Aa, it is equally likely that the offspring will inherit the A 



gene or the a gene from that parent. If one parent is of genotype aa and the other parent is of genotype Aa, the 
offspring will always receive an a gene from the aa parent and will receive either an A gene or an a gene, with 
equal probability, from the Aa parent. Consequently, each of the offspring has equal probability of being 
genotype aa ox Aa. In Table 1 we list the probabilities of the possible genotypes of the offspring for all 
possible combinations of the genotypes of the parents. 

Table 1 



Gcnor\pc 
of OfTvpriit^ 


C;enofype«^ of Parents 


.4.4-4.4 


AA-Aa 


AA~aa 


Aa-Aa 


Aa^ta 


aa-aa 


/LI 


1 


I 


0 


1 

4 


0 


0 


Aa 


0 


1 


1 


I 

3 


1 
2 


0 


aa 


0 


0 


0 


1 

4 


5 


1 



EXAMPLE 1 Distribution of Genotypes in a Population A 

Suppose that a farmer has a large population of plants consisting of some distribution of all 
three possible genotypes AA, Aa, and aa. The farmer desires to undertake a breeding program in 
which each plant in the population is always fertilized with a plant of genotype AA and is then 
replaced by one of its offspring. We want to derive an expression for the distribution of the 
three possible genotypes in the population after any number of generations. 

For « = 0, 1, 2, let us set 

= fraction of plants of genotj^e AA in « th generation 
byi = fraction of plants of genotype AamnHlci generation 
Cy^ = fraction of plants of genotj^e aam?2^ generation 

Thus ^0? i'o? ^i^d "0 specify the initial distribution of the genotypes. We also have that 

a„ + Z?„ + c„ = 1 for « = 0, 1,2,..- 

From Table 1 we can determine the genotype distribution of each generation from the genotype 
distribution of the preceding generation by the following equations: 

by, = + « = 1,2,... (1) 

c„ = 0 

For example, the first of these three equations states that all the offspring of a plant of genotype 
AA will be of genotype AA under this breeding program and that half of the offspring of a plant 
of genotype Aa will be of genotype AA. 



Equations 1 can be written in matrix notation as 



where 







1) n 


= 1 2 

































(2) 



oi. 

0 0 0 

Note that the three columns of the matrix Mare the same as the first three columns of Table 1. 
From Equation 2 it follows that 

x^"> = Mx<"-^^ = V"-^ = . . . = (3) 

Consequently, if we can find an explicit expression for M^, we can use 3 to obtain an explicit 
expression for x*^"). To find an explicit expression for M we first diagonalize M. That is, we 
find an invertible matrix P and a diagonal matrix D such that 



M = PDP~^ 

With such a diagonalization, we then have (see Exercise 1) 

Jlf« = />Z)«p-ifor«=l,2,, 



(4) 



where 



Ai 0 0 

0 A2 0 

« ■ ■ 

: : 

0 0 0 



0 
0 

■ 

A* 



0 0 

0 0 



0 
0 

s 



0 0 0 ... 



The diagonahzation of M is accomplished by finding its eigenvalues and corresponding 
eigenvectors. These are as follows (verify): 



Eigenvalues: Ai = 1, 

Corresponding eigenvectors: vi = 

Thus, in Equation 4 we have 



A2 = -i, 



A3 = 0 



"r 




r 




r 


0 




-1 


. V3 = 


-2 


0 




0 




1 





'Al 


0 


0 ■ 




"l 


0 


o" 


D = 


0 


A2 


0 




0 


1 

2 


0 




0 


0 


A3 




0 


0 


0 



and 



p= [V1IV2IV3] = 



1 1 1 

0 -1 -2 

0 0 1 



Therefore, 



"1 
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1 




0 


'1 


1 


r 
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-1 


-2 


0 


(ir 


0 


0 


-1 


-2 


h 


0 


0 


1 


0 


0 


0 


0 


0 


1 





or 



1 1 

0 
0 



CO 



(If*- (I) 



M-1 



CO 



0 



Using the fact that aQ + bQ+CQ= 1 , we thus have 



(5) 



c„ = 0 

These are explicit formulas for the fractions of the three genotypes in the «th generation of 
plants in terms of the initial genotype fractions. 

Because ^-i- j tends to zero as n approaches infinity, it follows from these equations that 

a„ — 1 
b„ - 0 
c« = 0 

as n approaches infinity. That is, in the limit all plants in the population will be genotype AA. 



EXAMPLE 2 Modifying Example 1 -4 



We can modify Example 1 so that instead of each plant being fertilized with one of genotype 
AA, each plant is fertilized with a plant of its own genotype. Using the same notation as in 
Example 1, we then find 

where 



M = 



0 ^ 0 



The columns of this new matrix Mare the same as the columns of Table 1 corresponding to 
parents with genotypes AA-AA, Aa-Aa, and aa-aa. 



The eigenvalues of M are (verify) 



The eigenvalue Ai = 1 has multiplicity two and its corresponding eigenspace is 
two-dimensional. Picking two linearly independent eigenvectors vi and V2 in that eigenspace, 
and a single eigenvector V3 for the simple eigenvalue A3 = we have (verify) 





1 




0 




1 


VI = 


0 


. V2 = 


0 


. V3 = 


-2 




0 




1 




1 



The calculations for are then 



1 0 
0 0 
0 1 



1 

— A 

1 



1 0 
0 1 

0 0 



0 
0 



1 
2 



i 
2 



bo 

CQ 



i 

2 
i 
2 



CQ 



Thus, 



1 

2 



In the limit, as n tends to infinity, ( 



«=1,2,... 

0 and (1) _» 0, so 



M + 1 



CO + ^^0 



(6) 



Thus, fertilization of each plant with one of its own genotype produces a population that in the 
limit contains only genotypes AA and aa. 



Autosomal Recessive Diseases 

There are many genetic diseases governed by autosomal inheritance in which a normal gene A dominates an 
abnormal gene a. Genotype AA is a normal individual; genotype Aa is a carrier of the disease but is not 
afflicted with the disease; and genotype aa is afflicted with the disease. In humans such genetic diseases are 
often associated with a particular racial group — for instance, cystic fibrosis (predominant among Caucasians), 
sickle-cell anemia (predominant among people of African origin), Cooley's anemia (predominant among 
people of Mediterranean origin), and Tay-Sachs disease (predominant among Eastern European Jews). 

Suppose that an animal breeder has a population of animals that carries an autosomal recessive disease. 
Suppose further that those animals afflicted with the disease do not survive to maturity. One possible way to 
control such a disease is for the breeder to always mate a female, regardless of her genotype, with a normal 
male. In this way, all future offspring will either have a normal father and a normal mother (AA-AA matings) 
or a normal father and a carrier mother (AA-Aa matings). There can be no AA-aa matings since animals of 
genotype aa do not survive to maturity. Under this type of mating program no future offspring will be 
afflicted with the disease, although there will still be carriers in future generations. Let us now determine the 
fraction of carriers in future generations. We set 



n = \,2,... 



where 

<3„ = fraction of population of genotype AA in « th generation 

= fraction of population of genotype Aa (carriers) in « th generation 

Because each offspring has at least one normal parent, we may consider the controlled mating program as one 



of continual mating with genotype Aa, as in Example 1 . Thus, the transition of genotype distributions from 
one generation to the next is governed by the equation 

3cC«) = j|/x<^«-l), « = 1,2,... 

where 



M = 



Because we know the initial distribution , the distribution of genotypes in the nih generation is thus given 
by 

x(«) = M"x'^, n = \,2,... 
The diagonahzation of M is easily carried out (see Exercise 4) and leads to 

1 0 



-(if 
(if 



1 1 

0 



p] 



0 



(if 



Because a^ + bQ=\,^Q have 



n = \.2.. 



Thus, as n tends to infinity, we have 

an -»1 

SO in the limit there will be no carriers in the population. 



(7) 



From 7 we see that 

6„ = ^Z>„-1, « = 1,2,... (8) 

That is, the fraction of carriers in each generation is one-half the fraction of carriers in the preceding 
generation. It would be of interest also to investigate the propagation of carriers under random mating, when 
two animals mate without regard to their genotypes. Unfortunately, such random mating leads to nonlinear 
equations, and the techniques of this section are not applicable. However, by other techniques it can be shown 
that under random mating. Equation 8 is replaced by 

by,- 



As a numerical example, suppose that the breeder starts with a population in which 10% of the animals are 
carriers. Under the controlled-mating program governed by Equation 8, the percentage of carriers can be 
reduced to 5% in one generation. But under random mating, Equation 9 predicts that 9.5% of the population 
will be carriers after one generation (i^ = .095 if i'vj-i = 10). In addition, under controlled mating no 
offspring will ever be afflicted with the disease, but with random mating it can be shown that about 1 in 400 
offspring will be bom with the disease when 10% of the population are carriers. 

X'Linked Inheritance 

As mentioned in the introduction, in X-linked inheritance the male possesses one gene (A or a) and the female 
possesses two genes (AA, Aa, or ad). The iQxm X-linked is used because such genes are found on the 
X-chromosome, of which the male has one and the female has two. The inheritance of such genes is as 
follows: A male offspring receives one of his mother's two genes with equal probability, and a female 
offspring receives the one gene of her father and one of her mother's two genes with equal probability. 
Readers familiar with basic probability can verify that this type of inheritance leads to the genotype 
probabilities in Table 2. 

Table 2 
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We will discuss a program of inbreeding in connection with X-linked inheritance. We begin with a male and 
female; select two of their offspring at random, one of each gender, and mate them; select two of the resulting 
offspring and mate them; and so forth. Such inbreeding is commonly performed with animals. (Among 
humans, such brother- sister marriages were used by the rulers of ancient Egypt to keep the royal line pure.) 

The original male-female pair can be one of the six types, corresponding to the six columns of Table 2: 

{AAA), {AAa), {Aaa), {a,AA), {a,Aa), (a,aa) 

The sibling pairs mated in each successive generation have certain probabilities of being one of these six 
types. To compute these probabilities, for « = 0, 1, 2, let us set 



= probability sibling-pair mated in ;^ th generation is type (A, AA) 

= probability sibling-pair mated in ;2 th generation is type (A, Aa) 

c„ = probability sibling-pair mated in ;^ th generation is lype (A, aa) 

dyi = probability sibling-pair mated in ;2 th generation is type {a, AA) 

= probability sibling-pair mated in » th generation is ^e {a^ Aa) 

Jy^ = probability sibling-pair mated in » th generation is lype {a^ aa) 



With these probabilities we form a column vector 



Sn 



» = 0, 1,2,... 



From Table 2 it follows that 



(10) 



where 



{AAA) iAAa) (Aaa) (a^AA) (a^Aa) (a^aa) 



M = 



4 

0^01^0 
4 4 

0 0 0 0 ^0 
4 

0^ 0 0 0 0 

4 

0 T 1 0 4 0 
4 4 

0 0 0 0 I 1 
4 



{AAA) 

{A, Aa) 

{Aaa) 
ia^AA) 

(a^Aa) 
(a, aa) 



For example, suppose that in the (» — l)-st generation, the sibling pair mated is type (A, Aa). Then their 
male offspring will be genotype ^ or a with equal probability, and their female offspring will be genotype AA 
or Aa with equal probability. Because one of the male offspring and one of the female offspring are chosen at 
random for mating, the next sibling pair will be one of type (^4, A'4), (A, Aa), (a, AA), or (a, Aa.) with 

equal probability. Thus, the second column of M contains "4" ^^^^ of the four rows corresponding to these 

4 

four sibling pairs. (See Exercise 9 for the remaining columns.) 



As in our previous examples, it follows from 10 that 



(11) 



After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be 

Ai = l. A2 = l, A3 = ^. A4=-|, A5 = ;i(H-»/5). A6 = ^(l-/5) 
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The diagonalization of M then leads to 

x(«) = P£)"P"^x<^, « = 1, 2, ... (12) 



where 



p = 
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We will not write out the matrix product in 12, as it is rather unwieldy. However, if a specific vector ^(P) is 
given, the calculation for jC") is not too cumbersome (see Exercise 6). 



Because the absolute values of the last four diagonal entries of D are less than 1 , we see that as n tends to 
infinity, 

'100000" 
0 1 0 0 0 0 
0 0 0 0 0 0 
"* 0 0 0 0 0 0 

0 0 0 0 0 0 
0 0 0 0 0 0 



And so, from Equation 12, 



(13) 



1 0 0 0 0 0 

0 1 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

Performing the matrix multiplication on the right, we obtain (verify) 

2 12 1 
aQ + -bo + -CQ + -do + -eQ 

0 

0 
0 

fo + j6o + ^<^0 + "^"^O + "l^o 

That is, in the limit all sibling pairs will be either type (A, AA) or type {a, aa) . For example, if the initial 
parents are type {A, Aa) (that is, i>Q = 1 and aQ = CQ = dQ = eQ = /o = 0), then as n tends to infinity, 

2 
3 
0 

0 
0 
1 
3 

2 1 
Thus, in the limit there is probability y that the sibling pairs will be (A, AA) , and probability y that they will 



be (a, aa). 



Exercise Set 10.16 

1. Show that if then Jl/" = -1 for?3= 1, 2,.... 

2. In Example 1 suppose that the plants are always fertilized with a plant of genotype Aa rather than one of 
genotype AA. Derive formulas for the fractions of the plants of genotypes AA, Aa, and aa in the wth 
generation. Also, find the limiting genotype distribution as n tends to infinity. 



Answer: 



^ » = 1, 2, ...i„ = X 



M+1 



(«30-co) 



) as »— »oo 



3. In Example 1 suppose that the initial plants are fertilized with genotype AA, the first generation is 
fertilized with genotype Aa, the second generation is fertilized with genotype AA, and this alternating 
pattern of fertilization is kept up. Find formulas for the fractions of the plants of genotypes AA, Aa, and aa 
in the «th generation. 

Answer: 



) « = 0, 1,2,... 



2 1 \ 
^211+1 = J - -^^^C2<30 - Ao -4co) 

^2n = \ )n=\,2,. 

4. In the section on autosomal recessive diseases, find the eigenvalues and eigenvectors of the matrix M and 
verify Equation 7. 



Answer: 



1. 



Eigenvalues: Ai = 1, A2 = "ir; eigenvectors: ei 



Suppose that a breeder has an animal population in which 25% of the population are carriers of an 
autosomal recessive disease. If the breeder allows the animals to mate irrespective of their genotype, use 
Equation 9 to calculate the number of generations required for the percentage of carriers to fall from 25% 
to 10%). If the breeder instead implements the controlled-mating program determined by Equation 8, what 
will the percentage of carriers be after the same number of generations? 



Answer: 



12 generations; .006% 

6. In the section on X-linked inheritance, suppose that the initial parents are equally likely to be of any of the 
six possible genotype parents; that is. 



1^= 



1 

6 
i 
6 
i 
6 
1 
6 
i 
6 
1 
6 

Using Equation 12, calculate x*^") and also calculate the limit of x^") as n tends to infinity. 



Answer: 



i + T • ^[(-3- + /S^' + (-3+ /5)(1 - /5r'l 



3 4«+2 



in+l 



7+T-^l(-3-/5)(i+/5)"-"+(-3+,^)(i-,/5r'l 



as«— »oo 



• M+2 



1 
2 
0 
0 
0 
0 

1 

2 

7. From 13 show that under X-linked inheritance with inbreeding, the probability that the limiting sibling 
pairs will be of type (A, AA) is the same as the proportion of A genes in the initial population. 

8. In X-linked inheritance suppose that none of the females of genotype Aa survive to maturity. Under 
inbreeding the possible sibling pairs are then 

(AAA). (A.aa), (a,AA), and (a. aa) 

Find the transition matrix that describes how the genotype distribution changes in one generation. 



Answer: 



10 0 0 
0 0 0 0 
0 0 0 0 
0 0 0 1 

9. Derive the matrix M in Equation 10 from Table 2. 



Section 10.16 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. 

(a) Use a computer to verify that the eigenvalues and eigenvectors of 

1 4- 0 0 0 0 



M = 



0 4 



0 1 4 0 
4 



0 0 0 0 ^0 
4 



1 
4 
i 
4 



0 4 0 0 0 0 
0 



1 



1 0 -i- 0 
4 



0 0 0 0 4 1 
4 



as given in the text are correct, 
(b) Starting with x*^"-^ = MtS^~^^ and the assumption that 



exists, we must have 



lim x^"^ = x 

n—*oa 



lim x"^"^ = AT lim x^""^^ or x = Mx 



This suggests that x can be solved directly using the equation (ikf — /)x = 0. Use a computer to solve the 
equation x = Afx? where 



x = 



a 

b 
c 

d 
e 

/ 



and iaf + i + c + ia?+e + / = 1; compare your results to Equation 13. Explain why the solution to 
(Af — /)x = 0 along with fl + i+ c+ df + e + / = 1 is not specific enough to determine Km x^"-^. 



T2. 

(a) Given 



from Equation 12 and 



use a computer to show that 



.1 1 ^(-3 + 1^) 
2-6 1 1 

-1 -3 ^(-1 + ^) \{-\-f5) 

1 3 ^(-l + /5) ^(-l-|/5) 

-2 6 1 1 

1 _i i(-3-|^) ^(-3 + /5) 



lim ZJ" = 



lim Jf" = 



1 0 0 0 0 0 

0 1 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 



1 2 i 2 i 0 

3 3 3 3 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 

0 1 1 1 ^ 1 

3 3 3 3 



(b) Use a computer to calculate for n = 10, 20, 30, 40, 50, 60, 70, and then compare your results to the 
limit in part (a). 



Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 



10.17 Age-Specific Population Growth 

In this section we investigate, using the Leslie matrix model, the growth over time of a female population that 
is divided into age classes. We then determine the limiting age distribution and growth rate of the population. 



Prerequisites 

Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 



□ 



One of the most common models of population growth used by demographers is the so-called Leslie model 
developed in the 1940s. This model describes the growth of the female portion of a human or animal 
population. In this model the females are divided into age classes of equal duration. To be specific, suppose 
that the maximum age attained by any female in the population is L years (or some other time unit) and we 
divide the population into n age classes. Then each class L I n years in duration. We label the age classes 
according to Table 1 . 

Table 1 



Age Class 


A}»e Intenal 


1 


10. L/n) 


-> 


[/./«, 21/ n) 


3 


[2L/ n. 3L/ n) 


rt- 1 


\{n-2)L/ n, (n-\)L/ n) 


n 


Un-l)L/ n. L] 



Suppose that we know the number of females in each of the n classes at time i = 0- In particular, let there be 
® females in the first class, 7;.-;'"' ' females in the second class, and so forth. With these n numbers we form a 



column vector: 



(0) 
^2 



We call this vector the initial age distribution vector. 



As time progresses, the number of females within each of the n classes changes because of three biological 
processes: birth, death, and aging. By describing these three processes quantitatively, we will see how to 
project the initial age distribution vector into the future. 

The easiest way to study the aging process is to observe the population at discrete times — say, 

^0, ^ 1 , ^2^ - ^.ic^ The Leslie model requires that the duration between any two successive observation 

times be the same as the duration of the age intervals. Therefore, we set 

^^0 = 0 
t\ = L/n 
t2 = 2L/n 

m 

! 
■ 

With this assumption, all females in the (i + l)-st class at time tj^+i were in the ith class at time i j.. 

The birth and death processes between two successive observation times can be described by means of the 
following demographic parameters: 



(i= 1, 2, . . . , n) 


The av erage number ol dauyliicrs 
bom to each female during the 
time &he » inlfae idi agp class 


(/=1,2,, 1) 


I he fraction of females m the ith 
age dass Ifaal can be expected to 
survive and pass into iSoa (t -i-l)-8t 
age dass 



By their definitions, we have that 

(i) ai>0 fori = l,2,...,« 

(ii) 0<ij<l fori = 1,2,. 

Note that we do not allow any to equal zero, because then no females would survive beyond the ith age 
class. We also assume that at least one «i- is positive so that some births occur. Any age class for which the 
corresponding value of flf is positive is called a fertile age class. 

We next define the age distribution vector at time tf^ by 




where is the number of females in the /th age class at time f^. Now, at time fj^, the females in the first age 
class are just those daughters bom between times t}^^\ and i^. Thus, we can write 



fnumber of ^ 
females 
in class 1 
at time tf^ 



number of 
daughters 

bom to 
females in } + { 
class 1 
betvveen times 



^ ^ number of ^ 



bom to 
females in 
class2 
betv^een times 



number of 
daughters 

bom to 
) + ... I ( females in 
class n 
between times 
andifc 



or, mathematically, 



(1) 



The females in the (i I 1 ) -st age class (i = 1, 2, — 1) at time tj^ are those females in the ith class at 
time tf^—l who are still alive at time tj^. Thus, 



or, mathematically. 



(number of 
females in 
class 2 H- 1 
at time 



/ fraction of \ 
females in 
class i 
who survive 
and pass into 

class i + 1 



' number of ^ 
females m 

class i 
at time t^^i 



(2) 



Using matrix notation, we can write Equations 1 and 2 as 



^1 
'3 



<at2 ^3 
il 0 0 

0 ^2 0 



— 1 

0 0 
0 0 



0 0 0... A„_i 0 



^1 
'3 



or more compactly as 



(3) 



where Z is the Leslie matrix 



L = 



0 0 
0 i>2 0 



0 0 
0 0 



0 0 0 ... A„_i 



t 

0 



(4) 



From Equation 3 it follows that 



X® = 



Thus, if we know the initial age distribution and the Leslie matrix L, we can determine the female c 
distribution at any later time. 

EXAMPLE 1 Female Age Distribution for Animals M 

Suppose that the oldest age attained by the females in a certain animal population is 1 5 years 
and we divide the population into three age classes with equal durations of five years. Let the 
Leslie matrix for this population be 

^0 4 3" 



L = 



4 0 0 



0 4 0 



If there are initially 1000 females in each of the three age classes, then from Equation 3 we 
have 



X® = 



1,000 
1,000 
1,000 



cC2) = ix(l> = 



X® = £x® = 



2 
0 



4 3 

0 0 


"1,000" 
1,000 




■7, ooo' 

500 


i n 

4 ■" 


1,000 




250 


4 3" 
0 0 


"7. OOO' 
500 




"2,750" 
3, 500 


i r, 

4 " 


250 




125 


4 3" 
0 0 


'2,750" 




"14, 375 


3, 500 




1, 375 


i n 

4 " 


125 




875 



Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females 
between 5 and 10 years of age, and 875 females between 10 and 15 years of age. 



Limiting Beliavior 



Although Equation 5 gives the age distribution of the population at any time, it does not immediately give a 
general picture of the dynamics of the growth process. For this we need to investigate the eigenvalues and 
eigenvectors of the Leslie matrix. The eigenvalues of L are the roots of its characteristic polynomial. As we 
ask you to verify in Exercise 2, this characteristic polynomial is 

piX) = \XI-L\ 

= A" -ai\"~^ - a2biX"-^ - «3iii2A""^ - ... - a„bib2..J>„-i 

To analyze the roots of this polynomial, it will be convenient to introduce the function 



* A^ 



+ ...+ 



X" 



(6) 



Using this function, the characteristic equation p(,X) = 0 can be written (verify) 



?(A) = 1 for Ait 0 



(7) 



Because all the cij and are nonnegative, we see that i5'(A) is monotonically decreasing for \ greater than 
zero. Furthermore, ^(A) has a vertical asymptote at A = 0 and approaches zero as A oo- Consequently, as 
Figure 10.17.1 indicates, there is a unique A, say A = Ai , such that q(Xi) = \ . That is, the matrix L has a 
unique positive eigenvalue. It can also be shown (see Exercise 3) that Aj has multiplicity 1 ; that is, Aj is not a 
repeated root of the characteristic equation. Although we omit the computational details, you can verify that 
an eigenvector corresponding to Aj is 

1 

il/Ai 



XI = 



bib2fX 



bib2b3fXi 



M-1 



(8) 



b\b2-^n-\ l\ 

Because A^ has multiplicity 1, its corresponding eigenspace has dimension 1 (Exercise 3), and so any 
eigenvector corresponding to it is some multiple of xj. We can summarize these results in the following 
theorem. 




Figure 10.17.1 



THEOREM 10.17.1 Existence of a Positive Eigenvalue 

A Leslie matrix L has a unique positive eigenvalue . This eigenvalue has multiplicity 1 and an 
eigenvector all of whose entries are positive. 



We will now show that the long-term behavior of the age distribution of the population is determined by the 
positive eigenvalue and its eigenvector . In Exercise 9 we ask you to prove the following result. 



THEOREM 10.17.2 Eigenvalues of a Leslie Matrix 

If is the unique positive eigenvalue of a Leslie matrix L, and A;,; is any other real or complex 
eigenvalue of Z., then \\k | S Ai . 



For our purposes the conclusion in Theorem 10.17.2 is not strong enough; we need A^ to satisfy \Xk\ < M • In 
this case X\ would be called the dominant eigenvalue of L. However, as the following example shows, not all 
Leslie matrices satisfy this condition. 

EXAMPLE 2 Leslie Matrix with No Dominant Eigenvalue A 



Let 



0 0 6 
4 0 0 



0^0 



Then the characteristic polynomial of L is 

p(A) = |A/-i| = A^-l 

The eigenvalues ofL are thus the solutions of A"^ = 1 — ^namely, 

' 2^2' 22 

All three eigenvalues have absolute value 1, so the unique positive eigenvalue Ai = 1 is not 
dominant. Note that this matrix has the property that £ ^ = /. This means that for any choice of the 
initial age distribution , we have 

The age distribution vector thus oscillates with a period of three time units. Such oscillations (or 



population waves, as they are called) could not occur if Ai were dominant, as we will see below. 



It is beyond the scope of this book to discuss necessary and sufficient conditions for to be a dominant 
eigenvalue. However, we will state the following sufficient condition without proof 



THEOREM 10.17.3 Dominant Eigenvalue 

If two successive entries ci^ and in the first row of a Leslie matrix L are nonzero, then the 
positive eigenvalue of L is dominant. 



Thus, if the female population has two successive fertile age classes, then its Leslie matrix has a dominant 
eigenvalue. This is always the case for realistic populations if the duration of the age classes is sufficiently 
small. Note that in Example 2 there is only one fertile age class (the third), so the condition of Theorem 
10.17.3 is not satisfied. In what follows, we always assume that the condition of Theorem 10.17.3 is satisfied. 



Let us assume that L is diagonalizable. This is not really necessary for the conclusions we will draw, but it 
does simplify the arguments. In this case, L has n eigenvalues, \\, A2, . . A„, not necessarily distinct, and n 
linearly independent eigenvectors, , X2, . . x„, corresponding to them. In this listing we place the dominant 
eigenvalue Ai first. We construct a matrix P whose columns are the eigenvectors of L\ 

P= [xi|x2|x3|...|x„] 

The diagonalization of L is then given by the equation 



L = P 



Ai 
0 



0 0 
A2 0 



0 
0 



0 0 0 ... A„ 



From this it follows that 



0 0 
0 0 



0 
0 



for/t=l,2,. 



0 0 0 

For any initial age distribution vector , we then have 



Af 0 0 
0 aJ 0 



0 
0 



4 



0 0 0 

for it = 1, 2 Dividing both sides of this equation by Aj" and using the fact that x^^^ = i*x*^» we have 



1 0 0 



0 0 



0 
0 



(9) 



Because Ai is the dominant eigenvalue, we have |A,- / Ai | < 1 for j = 2, 3, It follows that 

(A, /AO'^-^O asit-»cx) forj = 2, 3,...,« 

Using this fact, we can take the limit of both sides of 9 to obtain 

10 0 ... O" 



0 0 0 



0 



(10) 



0 0 0 ... 0 

Let us denote the first entry of the column vector p ~^x^ by the constant c. As we ask you to show in 
Exercise 4, the right side of 10 can be written as cxi, where c is a positive constant that depends only on the 
initial age distribution vector • Thus, 10 becomes 



lim <i-LxC*) =cxi 



Equation 1 1 gives us the approximation 



x'^^^csicAfxi 



(11) 



(12) 



for large values of k. From 12 we also have 



xC*-l)~cAf-^xi 



(13) 



Comparing Equations 12 and 13, we see that 



(14) 



for large values of k. This means that for large values of time, each age distribution vector is a scalar multiple 
of the preceding age distribution vector, the scalar being the positive eigenvalue of the Leslie matrix. 
Consequently, the proportion of females in each of the age classes becomes constant. As we will see in the 
following example, these limiting proportions can be determined from the eigenvector xi. 



EXAMPLE 3 Example 1 Revisited M 



The Leslie matrix in Example 1 was 



L = 



0 4 3 
\ 0 0 

0 1 0 



5 3 

Its characteristic polynomial is p(X) = \ —2\ — ^, and you can verify that the positive 

3 ^ 
eigenvalue is \\ = From 8 the corresponding eigenvector is 

1 

1 

2. 
3 
2 



XI = 



Mil 
(tf 



1 

i 

3 
_L 
18 



From 14 we have 



2 



for large values of k. Hence, every five years the number of females in each of the three classes 
will increase by about 50%, as will the total number of females in the population. 



From 12 we have 



-(f)' 



1 

3 

_x_ 

18 



Consequently, eventually the females will be distributed among the three age classes in the ratios 
liyj-^. This corresponds to a distribution of 72% of the females in the first age class, 24% of the 

females in the second age class, and 4% of the females in the third age class. 



EXAMPLE 4 Female Age Distribution for Humans M 

In this example we use birth and death parameters from the year 1965 for Canadian females. 
Because few women over 50 years of age bear children, we restrict ourselves to the portion of the 
female population between 0 and 50 years of age. The data are for 5-year age classes, so there are a 
total of 10 age classes. Rather than writing out the 10 x 10 Leslie matrix in full, we list the birth 
and death parameters as follows: 



Age Interval 


"i 




[0, 5) 


0.00000 


0.99651 


15. 10) 


0.(;KX)24 


0.99820 


(10, 15) 


0.05861 


0.99802 


(15, 20) 


0.28608 


0.99729 


(20, 25 ) 


0.44791 


0.99694 


|25. 30) 


0.36399 


0.99621 


(30, 35) 


0.22259 


0.99460 


135.40) 


O.K)457 


0.99184 


(40.45) 


0.02826 


0.98700 


(45, 50) 


0.(K)240 





Ai = 1.07622 and xi = 



Using numerical techniques, we can approximate the positive eigenvalue and corresponding 
eigenvector by 

"^1.00000 
0.92594 
0.85881 
0.79641 
0.73800 
0.68364 
0.63281 
0.58482 
0.53897 
0.49429 

Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually every 5 
years their numbers would increase by 7.622%. From the eigenvector x^, we see that, in the limit, 
for every 100,000 females between 0 and 5 years of age, there will be 92,594 females between 5 
and 10 years of age, 85,881 females between 10 and 15 years of age, and so forth. 



Let us look again at Equation 12, which gives the age distribution vector of the population for large times: 

x^^^~cAfxi (15) 

Three cases arise according to the value of the positive eigenvalue : 

(i) The population is eventually increasing if Ai > 1 . 

(ii) The population is eventually decreasing if A i < 1 . 

(iii) The population eventually stabilizes if Ai = 1 . 

The case = 1 is particularly interesting because it determines a population that has zero population 
growth. For any initial age distribution, the population approaches a limiting age distribution that is some 
multiple of the eigenvector x^ . From Equations 6 and 7, we see that Ai = 1 is an eigenvalue if and only if 



a\ +(32*1 +^3(3*1*2 + --- + <3!m*1*2 ---^M-l = 1 



(16) 



The expression 



(17) 



is called the net reproduction rate of the population. (See Exercise 5 for a demographic interpretation of R.) 
Thus, we can say that a population has zero population growth if and only if its net reproduction rate is 1 . 



Exercise Set 10.17 



1. Suppose that a certain animal population is divided into two age classes and has a Leslie matrix 



(a) Calculate the positive eigenvalue Aj of Z and the corresponding eigenvector xi . 

(b) Beginning with the initial age distribution vector 

"100" 

0 

calculate -j^^, -y^, x^^^ ^^d x*^^, rounding off to the nearest integer when necessary. 

(c) Calculate x^"^ using the exact formula = Lx^^ using the approximation formula ^ Ajx^^ 



xCP) = 



Answer: 
(a) 



^1 = 2' ^1 = 



(b)^(l) = 



"100" 




"175" 




"250" 




"382" 




"570" 


. 50_ 




50_ 








125 




_191_ 



(c) x(^ = ix(^ = 



857 
285 



~Aix(^ = 



855 
287 



2. Find the characteristic polynomial of a general Leslie matrix given by Equation 4. 

^' (a) Show that the positive eigenvalue Aj of a Leslie matrix is always simple. Recall that a root Aq of a 
polynomial i3'(A) is simple if and only if q ' (Aq) * 0. 

(b) Show that the eigenspace corresponding to has dimension 1. 

4. Show that the right side of Equation 10 is cx^, where c is the first entry of the column vector P~^x'^- 

5. Show that the net reproduction rate R, defined by 1 7, can be interpreted as the average number of 
daughters bom to a single female during her expected lifetime. 



6. Show that a population is eventually decreasing if and only if its net reproduction rate is less than 1. 
Similarly, show that a population is eventually increasing if and only if its net reproduction rate is greater 
than 1. 

7. Calculate the net reproduction rate of the animal population in Example 1 . 
Answer: 

2.375 

8. (For readers with a hand calculator) Calculate the net reproduction rate of the Canadian female 
population in Example 4. 

Answer: 

1.49611 

9. (For readers who have read Section 10.1-Section 10.3) Prove Theorem 10.17.2. [Hint: Write = 
substitute into 7, take the real parts of both sides, and show that r < . 



Section 10.17 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. Consider the sequence of Leslie matrices 



L2 = 



0 a 
bi 0 



£4 = 



i3 = 



0 0a 
Z>i 0 0 
0 62 0 



0 


0 


0 


a 




0 


0 


0 


0 


bl 


0 


0 


0 


0 


b2 


0 



0 


0 


0 


0 


a 


bi 


0 


0 


0 


0 


0 


bi 


0 


0 


0 


0 


0 


b2 


0 


0 


0 


0 


0 


bA 


0 



(a) Use a computer to show that 

for a suitable choice of a in terms of 6 j , fe2» - - •> • 

(b) From your results in part (a), conjecture a relationship between a and 61, &2» ^w-l that will make 
Z;^ = /„, where 



0 0 0 

il 0 0 

0 i)2 0 

0 0 i3 



0 a 

0 0 

0 0 

0 0 



0 0 0... i)„_i 0 



(c) Deteraiine an expression for ^m(A) = |A/j^ — | and use it to show that all eigenvalues of satisfy 
|A| = 1 when a and b\,b2,..^ ^n—\ related by the equation determined in part (b). 



T2. Consider the sequence of Leslie matrices 



£3= 



a ap ap 

6 0 0 
0 i> 0 



i4 = 



2 3 
a ap ap ap 



b 0 
0 b 
0 0 



0 
0 

b 



0 
0 
0 



£3= 



a ap ap^ 


ap^ ap^ 




i 0 0 


0 0 




0 i> 0 


0 0 


> - - - 


0 0 i) 


0 0 




0 0 0 


b 0 




ap ap^ ... 


€tp ap 


0 0 ... 


0 


0 


b 0 ... 


0 


0 


0 b ... 


0 


0 


i 8 


i 


: 


0 0 ... 


b 


0 



where 0 1 , 0 <b < 1 , and ] a- 

(a) Choose a value for n (say, « = g)- For various values of a, b, and use a computer to determine the 
dominant eigenvalue of £„, and then compare your results to the value of a + bp. 

(b) Show that 



which means that the eigenvalues of Z.„ must satisfy 

A«+l - -f i;?)A" + = 0 

(c) Can you now provide a rough proof to explain the fact that Xi7^a + bpl 



T3. Suppose that a population of mice has a Leslie matrix L over a 1 -month period and an initial age 



distribution vector given by 



L = 



0 


0 


1 
1 

2 


A 

5 


2 

To 


A 

5 


0 


0 


0 


0 


0 


9 
10 


0 


0 


0 


0 


0 


9 

10 


0 


0 


0 


0 


0 


4 
5 


0 


0 


0 


0 


0 


3 
10 



^ 0 



and x® = 



50 

40 
30 
20 
10 
5 



(a) Compute the net reproduction rate of the population. 

(b) Compute the age distribution vector after 100 months and 101 months, and show that the vector after 101 
weeks is approximately a scalar multiple of the vector after 100 months. 

(c) Compute the dominant eigenvalue of L and its corresponding eigenvector. How are they related to your 
results in part (b)? 

(d) Suppose you wish to control the mouse population by feeding it a substance that decreases its age-specific 
birthrates (the entries in the first row of L) by a constant fraction. What range of fractions would cause the 
population eventually to decrease? 
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10.18 Harvesting of Animal Populations 



In this section we employ the Leslie matrix model of population growth to model the sustainable harvesting 
of an animal population. We also examine the effect of harvesting different fractions of different age groups. 

Ji 



Prerequisites 

Age-Specific Population Growth (Section 10.17) 



Harvesting 

In Section 10.17 we used the Leslie matrix model to examine the growth of a female population that was 
divided into discrete age classes. In this section, we investigate the effects of harvesting an animal population 
growing according to such a model. By harvesting we mean the removal of animals from the population. 
(The word harvesting is not necessarily a euphemism for "slaughtering"; the animals may be removed from 
the population for other purposes.) 

In this section we restrict ourselves to sustainable harvesting policies. By this we mean the following: 

r n 



DEFINITION 1 



A harvesting policy in which an animal population is periodically harvested is said to be sustainable 
if the yield of each harvest is the same and the age distribution of the population remaining after each 
harvest is the same. 

L J 



Thus, the animal population is not depleted by a sustainable harvesting policy; only the excess growth is 
removed. 

As in Section 10.17, we will discuss only the females of the population. If the number of males in each age 
class is equal to the number of females — a reasonable assumption for many populations — then our harvesting 
policies will also apply to the male portion of the population. 



The Harvesting Model 

Figure 10.18.1 illustrates the basic idea of the model. We begin with a population having a particular age 
distribution. It undergoes a growth period that will be described by the Leslie matrix. At the end of the growth 
period, a certain fraction of each age class is harvested in such a way that the unharvested population has the 



same age distribution as the original population. This cycle repeats after each harvest so that the yield is 
sustainable. The duration of the harvest is assumed to be short in comparison with the growth period so that 
any growth or change in the population during the harvest period can be neglected. 



Population before gmwlh period 



Growth 



Population after growth period 



] 



Not hanrcstcd 



Population 



har\'ested 



Harvested 



Figure 10.18.1 



To describe this harvesting model mathematically, let 



x = 



^1 
^2 



be the age distribution vector of the population at the beginning of the growth period. Thus .'^2 is the number 
of females in the zth class left unharvested. As in Section 10.17, we require that the duration of each age class 
be identical with the duration of the growth period. For example, if the population is harvested once a year, 
then the population is divided into 1-year age classes. 



If L is the Leslie matrix describing the growth of the population, then the vector is the age distribution 
vector of the population at the end of the growth period, immediately before the periodic harvest. Let A?j , for 
j = 1, 2, . . be the fraction of females from the /th class that is harvested. We use these n numbers to form 
« X « diagonal matrix 



^1 


0 


0 ... 


0 


0 


h2 


0 ... 


0 


0 


0 


A3 ... 


0 


0 


0 


0 ... 





which we will call the harvesting matrix. By definition, we have 

0<A,<1 (i = 1.2 n) 

That is, we can harvest none {hi = 0) , all {h^ = 1 ) , or some fraction (0 < Aj < 1 ) of each of the n classes. 
Because the number of females in the zth class immediately before each harvest is the zth entry (Z.x)j of the 
vector ix? th^ ^th entry of the column vector 



is the number of females harvested from the /th class. 



From the definition of a sustainable harvesting policy, we have 



age distribution 

at end of 
growth penod 



— [harvest] = 



age distnbution 
at be.ginning of 
growth penod 



or, mathematically. 



If we write Equation 1 in the form 



(1) 



(2) 



we see that x must be an eigenvector of the matrix (/ — H )L corresponding to the eigen- value 1. As we will 
now show, this places certain restrictions on the values of and x. 



Suppose that the Leslie matrix of the population is 

ai 02 03 ■■ 



L = 



bi 0 
0 b2 



0 
0 



. 0 0 
0 0 



0 0 0... b„-i 0 



(3) 



Then the matrix (/ — H)L is (verify) 

(l-Al)fll (1-Ai)fl2 (l-AO'^S 
(1-A2)*1 0 0 

0 (1-A3)62 0 



iI-H)L = 



0 0 
0 0 



0 



0 



! 

0 



0 



Thus, we see that (I — H)L is a matrix with the same mathematical form as a Leslie matrix. In Section 10.17 
we showed that a necessary and sufficient condition for a Leslie matrix to have 1 as an eigenvalue is that its 
net reproduction rate also be 1 [see Eq. 16 of Section 10.17]. Calculating the net reproduction rate of 
(I — H)L and setting it equal to 1, we obtain (verify) 

(1-Ai)[ai I a2bi(1-h2)-\-a3bib2(1-h2)(1-h3)-\-... 

I a,,bib2..hn-i(\ -h2)0 -h2)...C^-h„)] = 1 



(4) 



This equation places a restriction on the allowable harvesting fractions. Only those values of ftj, A2 A; 



that satisfy 4 and that lie in the interval [0, 1 ] can produce a sustainable yield. 

If hi, h2, hyj do satisfy 4, then the matrix ( / — H)L has the desired eigenvalue Aj = 1 . Furthermore, this 
eigenvalue has multiplicity 1 , because the positive eigenvalue of a Leslie matrix always has multiplicity 1 
(Theorem 10.17.1). This means that there is only one linearly independent eigenvector x satisfying Equation 
2. [See Exercise 3(b) of Section 10.17.] One possible choice for x is the following normalized eigenvector: 

1 

6li>2(l-A2)(l-A3) 
bib2b2(\-h2)(\-h2)(\-h4) 

ili2H-6«-l(l -^2)(1 -A3)-(l -h„) 

Any other solution x of 2 is a multiple of . Thus, the vector determines the proportion of females within 
each of the n classes after a harvest under a sustainable harvesting policy. But there is an ambiguity in the 
total number of females in the population after each harvest. This can be determined by some auxiliary 
condition, such as an ecological or economic constraint. For example, for a population economically 
supported by the harvester, the largest population the harvester can afford to raise between harvests would 
determine the particular constant that x^ is multiplied by to produce the appropriate vector x in Equation 2. 
For a wild population, the natural habitat of the population would determine how large the total population 
could be between harvests. 

Summarizing our results so far, we see that there is a wide choice in the values of ^j, /22. that will 

produce a sustainable yield. But once these values are selected, the proportional age distribution of the 
population after each harvest is uniquely determined by the normalized eigenvector x^ defined by Equation 5. 
We now consider a few particular harvesting strategies of this type. 



Uniform Harvesting 

With many populations it is difficult to distinguish or catch animals of specific ages. If animals are caught at 
random, we can reasonably assume that the same fraction of each age class is harvested. We therefore set 

k = ki=h2 = ... = hy^ 

Equation 2 then reduces to (verify) 

Hence, 1 / (1 «— must be the unique positive eigenvalue of the Leslie growth matrix L. That is. 

Solving for the harvesting fraction h, we obtain 

A = l-(1/Ai) (6) 



The vector x^ , in this case, is the same as the eigenvector of L corresponding to the eigenvalue \i . From 



Equation 8 of Section 10.17, this is 



1 

bi/Xi 

XI = o (' 

b\b2..h„-\ /A"~* 

From 6 we can see that the larger X\ is, the larger is the fraction of animals we can harvest without depleting 
the population. Note that we need X\ > 1 in order for the harvesting fraction h to lie in the interval (0, 1 ) . 
This is to be expected, because Aj > 1 is the condition that the population be increasing. 

EXAMPLE 1 Harvesting Sheep < 



For a certain species of domestic sheep in New Zealand with a growth period of 1 year, the 
following Leslie matrix was found (see G. Caughley, "Parameters for Seasonally Breeding 
Populations," £'co/ogv, 48, 1967, pp. 834-839). 



L = 



000 


.045 


.391 


.472 


.484 


.546 


.543 


.502 


.468 


.459 


.433 


.421 


845 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.975 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.965 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.950 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.926 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.895 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.850 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.786 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.691 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.561 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


.370 


0 



The sheep have a Hfespan of 12 years, so they are divided into 12 age classes of duration 1 year 
each. By the use of numerical techniques, the unique positive eigenvalue of L can be found to 
be 

Xi = 1.176 

From Equation 6, the harvesting fraction h is 

h=\-0 / Ai) = !-(!/ 1.176) = .150 

Thus, the uniform harvesting policy is one in which 15.0 % of the sheep from each of the 12 
age classes is harvested every year. From 7 the age distribution vector of the sheep after each 
harvest is proportional to 



XI = 



(8) 



1.000 
0.719 
0.596 
0.489 
0.395 
0.311 
0.237 
0.171 
0.114 
0.067 
0.032 
0.010 



From 8 we see that for every 1 000 sheep between 0 and 1 year of age that are not harvested, 
there are 719 sheep between 1 and 2 years of age, 596 sheep between 2 and 3 years of age, and 
so forth. 



Harvesting Only the Youngest Age Class 

In some populations only the youngest females are of any economic value, so the harvester seeks to harvest 
only the females from the youngest age class. Accordingly, let us set 

ki = h 

k2 = h3 = ..=k„ = 0 

Equation 4 then reduces to 

(1 -h)(ai +a2b\ + a2b\b2 + ... + a„b\b2..J>n-i) = 1 

or 

0-k)R=\ 

where R is the net reproduction rate of the population. [See Equation 17 of Section 10.17.] Solving for h, we 
obtain 

k = \-(\fR) (9) 

Note from this equation that a sustainable harvesting policy is possible only if R > ] . This is reasonable 
because only if ; > ] is the population increasing. From Equation 5, the age distribution vector after each 
harvest is proportional to the vector 



1 

b\b2 
x\ = 



EXAMPLE 2 Sustainable Harvesting Policy A 



Let us apply this type of sustainable harvesting policy to the sheep population in Example 1 . 
For the net reproduction rate of the population we find 

R = ai+a2bi+a2bib2 + ... + atjb\b2..2)„-i 

= (.000) + (.045)(.845) +...+ (.421)(.845)(.975)...(.370) 

= 2.514 

From Equation 9, the fraction of the first age class harvested is 

^ = 1 - (1 /i?) = 1 - (1 / 2.514) = .602 

From Equation 10, the age distribution of the sheep population after the harvest is proportional 
to the vector 



XI = 



1.000 
.845 
(.845) (.975) 

(.845) (.975) (.965) 



(.845) (.975).. .(.370) 



1.000 

0.845 
0.824 
0.795 
0.755 
0.699 
0.626 
0.532 
0.418 
0.289 
0.162 
0.060 



(11) 



A direct calculation gives us the following (see also Exercise 3): 



Lxi = 



2 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



514 

845 
824 
795 
755 
699 
626 
532 
418 
289 
162 
060 



(12) 



The vector Lxi is the age distribution vector immediately before the harvest. The total of all 
entries in Lxi is 8.520, so the first entry 2.514 is 29.5% of the total. This means that 
immediately before each harvest, 29.5% of the population is in the youngest age class. Since 
60.2%) of this class is harvested, it follows that 17.8%o (= 60.2%o of 29.5%o) of the entire sheep 
population is harvested each year. This can be compared with the uniform harvesting policy of 
Example 1, in which 15.0%o of the sheep population is harvested each year. 



We saw in Example 1 that a sustainable harvesting policy in which the same fraction of each age class is 
harvested produces a yield of 15.0 % of the sheep population. In Example 2 we saw that if only the youngest 
age class is harvested, the resulting yield is 17.8 % of the population. There are many other possible 
sustainable harvesting policies, and each generally provides a different yield. It would be of interest to find a 
sustainable harvesting policy that produces the largest possible yield. Such a policy is called an optimal 
sustainable harvesting policy, and the resulting yield is called the optimal sustainable yield. However, 
determining the optimal sustainable yield requires linear programming theory, which we will not discuss here. 
We refer you to the following result, which appears in J. R. Beddington and D. B. Taylor, "Optimum Age 
Specific Harvesting of a Population," Biometrics, 29, 1973, pp. 801-809. 



THEOREM 10.18.1 Optimal Sustainable Yield 

An optimal sustainable harvesting policy is one in which either one or two age classes are harvested. 
If two age classes are harvested, then the older age class is completely harvested. 



Optimal Sustainable Yield 



As an illustration, it can be shown that the optimal sustainable yield of the sheep population is attained when 



Ai = 0.522 
A9= 1.000 



(13) 



and all other values ofh^ are zero. Thus, 52.2 % of the sheep between 0 and 1 year of age and all the sheep 
between 8 and 9 years of age are harvested. As we ask you to show in Exercise 2, the resulting optimal 
sustainable yield is 19.9 % of the population. 



Exercise Set 10.18 

1. Let a certain animal population be divided into three 1-year age classes and have as its Leslie matrix 

0 4 : 



4 0 0 



0 4 0 



(a) Find the yield and the age distribution vector after each harvest if the same fraction of each of the 
three age classes is harvested every year. 

(b) Find the yield and the age distribution vector after each harvest if only the youngest age class is 
harvested every year. Also, find the fraction of the youngest age class that is harvested. 

Answer: 



(a) 



Yield = 33 j% of population; xi 



(b) 



Yield = 45.8% of population; xi = 



1 

i 
3 
_L 
18 

1 

i 

2 
1 
8 



; harvest 57.9% of youngest age class 



2. For the optimal sustainable harvesting policy described by Equations 13, find the vector that specifies 
the age distribution of the population after each harvest. Also calculate the vector Lxi and verify that the 
optimal sustainable yield is 19.9 % of the population. 



Answer: 



XI = 



1 nnn 












.824 




.824 












7^^ 




, 1X1 = 














0 




.418 


0 




0 


0 




0 


0 




0 



1.090+ 418 
7.584 



= .199 



3. Use Equation 10 to show that if only the first age class of an animal population is harvested 

'i?-l 



z;xi-xi = 



0 
0 

0 



where R is the net reproduction rate of the population. 

4. If only the /th class of an animal population is to be periodically harvested (/ = 1, 2, «), find the 
corresponding harvesting fraction ^ j. 

Answer: 

hj={R-\) I {ajb\b2' ' • + ■ • • +ay^bib2' ' 

5. Suppose that all of the Jth class and a certain fraction hj of the /th class of an animal population is to be 
periodically harvested (1 <l<J<n). Calculate A?/. 

Answer: 

^ _ <gi+<g2^i+ - - - +{aj-\b\b2' ' - j?7-2) - 1 
ajb\b2' ' 'bj-i+ • • • +aj-ibib2' ' 'bj^2 

Section 10.18 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. The results of Theorem 10.18.1 suggest the following algorithm for determining the optimal sustainable 
yield. 



1 . For each value of j = 1, 2 set ftj- = k and ftft = 0 for ^ i and calculate the respective yields. These 

n calculations give the one-age-class results. Of course, any calculation leading to a value of h not between 
0 and 1 is rejected. 

2. For each value of j = 1, 2, >3 — 1 andy = i \,i + 2, .... SQt = h, hj = \, and^jr; = 0 for j, 
j and calculate the respective yields. These "^-^(^ — 1) calculations give the two-age-class results. Of 
course, any calculation leading to a value of h not between 0 and 1 is again rejected. 

3. Of the yields calculated in parts (i) and (ii), the largest is the optimal sustainable yield. Note that there will 
be at most 

w + i«(«-l) = -i;2(«+l) 

calculations in all. Once again, some of these may lead to a value of h not between 0 and 1 and must 
therefore be rejected. 

If we use this algorithm for the sheep example in the text, there will be at most ■^■(1 2) (12 + 1)= 78 

calculations to consider. Use a computer to do the two-age-class calculations for fej = A, ftj = 1, and ftfc = 0 
for it gt 1 or j for / = 2, 3, .. 1 2 . Construct a summary table consisting of the values of ftj and the 
percentage yields using J = 2, 3,..., 12, which will show that the largest of these yields occurs when J = 9. 

T2. Using the algorithm in Exercise Tl , do the one-age-class calculations for — h and kj^, = 0 for ,h': :t j. for 
? = 1, 2, 12 . Construct a summary table consisting of the values of hj and the percentage yields using 
j = 1, 2, 12, which will show that the largest of these yields occurs when j = 9- 

T3. Referring to the mouse population in Exercise T3 of Section 10.17, suppose that reducing the birthrates 
is not practical, so you instead decide to control the population by uniformly harvesting all of the age classes 
monthly. 

(a) What fraction of the population must be harvested monthly to bring the mouse population to equilibrium 
eventually? 

(b) What is the equilibrium age distribution vector under this uniform harvesting policy? 

(c) The total number of mice in the original mouse population was 155. What would be the total number of 
mice after 5, 10, and 200 months under your uniform harvesting policy? 
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10.19 A Least Squares Model for Human Hearing 

In this section we apply the method of least squares approximation to a model for human hearing. The use of this 
method is motivated by energy considerations. 



Prerequisites 

Inner Product Spaces 
Orthogonal Projection 
Fourier Series (Section 6.6) 



Anatomy of the Ear 

We begin with a brief discussion of the nature of sound and human hearing. Figure 10.19.1 is a schematic diagram 
of the ear showing its three main components: the outer ear, middle ear, and inner ear. Sound waves enter the outer 
ear where they are channeled to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically 
link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass on the vibrations of the 
eardrum to a fluid within the cochlea. The cochlea contains thousands of minute hairs that oscillate with the fluid. 
Those near the entrance of the cochlea are stimulated by high frequencies, and those near the tip are stimulated by 
low frequencies. The movements of these hairs activate nerve cells that send signals along various neural pathways 
to the brain, where the signals are interpreted as sound. 




Figure 10.19.1 



The sound waves themselves are variations in time of the air pressure. For the auditory system, the most 
elementary type of sound wave is a sinusoidal variation in the air pressure. This type of sound wave stimulates the 
hairs within the cochlea in such a way that a nerve impulse along a single neural pathway is produced (Figure 
10. 19.2). A sinusoidal sound wave can be described by a function of time 



q(t)=AQ + A sm(sAa - S) 



(1) 



where q(t) is the atmospheric pressure at the eardrum, is the normal atmospheric pres-sure, A is the maximum 
deviation of the pressure from the normal atmospheric pressure, e^; / 2ir is the frequency of the wave in cycles per 
second, and r) is the phase angle of the wave. To be perceived as sound, such sinusoidal waves must have 
frequencies within a certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. 
Frequencies outside this range will not stimulate the hairs within the cochlea enough to produce nerve signals. 





Neural pathways 
to brain 



Figure 10.19.2 

To a reasonable degree of accuracy, the ear is a linear system. This means that if a complex sound wave is a finite 
sum of sinusoidal components of different amplitudes, frequencies, and phase angles, say. 



q(t) =Aq + A\ sm(u}\t — S\)+A2 sm(LJ2^ - ^2) + - + -^m sin(u;„^ - ff„) 



(2) 



then the response of the ear consists of nerve impulses along the same neural pathways that would be stimulated by 
the individual components (Figure 10.19.3). 




V 
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Figure 10.19.3 



Let us now consider some periodic sound wave p(i) with period T [i.e., p(t) = p(t -}- T)] that is not a finite sum 
of sinusoidal waves. If we examine the response of the ear to such a periodic wave, we find that it is the same as 
the response to some wave that is the sum of sinusoidal waves. That is, there is some sound wave q(l") as given by 
Equation 2 that produces the same response as p(t), even though p(t) and q(t) are different functions of time. 



We now want to determine the frequencies, amplitudes, and phase angles of the sinusoidal components of q(t). 
Because q(t) produces the same response as the periodic wave p(t), it is reasonable to expect that q(t) has the 
same period T as p (t) . This requires that each sinusoidal term in q{t) have period T. Consequently, the frequencies 



of the sinusoidal components must be integer multiples of the basic frequency 1 / T of the function p(^). Thus, the 
id^l^ in Equation 2 must be of the form 

= 2h: IT, ^=1,2,... 

But because the ear cannot perceive sinusoidal waves with frequencies greater than 20,000 cps, we may omit those 
values of k for which I 'Zk = k f T is greater than 20,000. Thus, q(t) is of the form 

q(t)=Ao + Aism{^^-Siy... + A„ sin(-^ - (3) 

where n is the largest integer such that ^ / T is not greater than 20,000. 

We now turn our attention to the values of the amplitudes Aq, A\, .... Ay^ and the phase angles S\ , - - that 
appear in Equation 3. There is some criterion by which the auditory system "picks" these values so that q(t) 
produces the same response as ;?(^). To examine this criterion, let us set 

If we consider q (t) as an approximation to p (t) , then e(t) is the error in this approximation, an error that the ear 
cannot perceive. In terms of ^(^), the criterion for the determination of the amplitudes and the phase angles is that 
the quantity 

f [e(t)fdt= f [p(t)^q(t)fdt (4) 
/O JO 

be as small as possible. We cannot go into the physiological reasons for this, but we note that this expression is 
proportional to the acoustic energy of the error wave over one period. In other words, it is the energy of the 
difference between the two sound waves p(t) midq(t) that determines whether the ear perceives any difference 
between them. If this energy is as small as possible, then the two waves produce the same sensation of sound. 
Mathematically, the function ^(/) in 4 is the least squares approximation to p(t) from the vector space C[0, T] of 
continuous functions on the interval [0, T] . (See Section 6.6.) 

Least squares approximations by continuous functions arise in a wide variety of engineering and scientific 
approximation problems. Apart from the acoustics problem just discussed, some other examples follow. 

1. Let S(x) be the axial strain distribution in a uniform rod lying along the x-axis from x — O^o x =1 (Figure 
10.19.4). The strain energy in the rod is proportional to the integral 

The closeness of an approximation ^(x) to S(x) can be judged according to the strain energy of the difference 
of the two strain distributions. That energy is proportional to 

which is a least squares criterion. 

2. Let E(t) be a periodic voltage across a resistor in an electrical circuit (Figure 10.19.5). The electrical energy 
transferred to the resistor during one period T is proportional to 

7 

^ [S(t)fdt 

l{q(t ) has the same period as fi'(^) and is to be an approximation to E(f), then the criterion of closeness might 
be taken as the energy of the difference voltage. This is proportional to 



r 



which is again a least squares criterion. 
3. Let y{x) be the vertical displacement of a uniform flexible string whose equilibrium position is along the x-axis 
from x = ^^^ X = 1 (Figure 10.19.6). The elastic potential energy of the string is proportional to 



If ^(a ) is to be an approximation to the displacement, then as before, the energy integral 



determines a least squares criterion for the closeness of the approximation. 




.¥=0 X^l 

Figure 10.19.4 




Figure 10.19.5 



displacement 




Figure 10.19.6 

Least squares approximation is also used in situations where there is no a priori justification for its use, such as for 
approximating business cycles, population growth curves, sales curves, and so forth. It is used in these cases 
because of its mathematical simplicity. In general, if no other error criterion is immediately apparent for an 
approximation problem, the least squares criterion is the one most often chosen. 



The following result was obtained in Section 6.6. 



THEOREM 10.19.1 Minimizing Mean Square Error on [0, 2tt] 

If / (^) is continuous on [0, 2;:] , then the trigonometric function g(t) of the form 

g(t) = -^aQ + a\ cos I + ... + an cos nt + b\ sml + + stnnt 

that minimizes the mean square error 

r2w 



has coefficients 



1 f^"" 



If the original function / (t) is defined over the interval [0, T] instead of [0, 2ir] , a change of scale will yield the 
following result (see Exercise 8): 



THEOREM 10.19.2 Minimizing Mean Square Error on [0, 7] 



If / {t) is continuous on [0, T] , then the trigonometric function g(^) of the form 
g(t) = -ai] + ai cos-^ 

that minimizes the mean square error 



g(t) = ^ao + ai cos^t + ... + a,, cos^t + bi sin^^ + + sin-^^ 



has coefficients 



ak = ^J^f(t)cos^dt, k=0,\,2,....» 



T 

bri 
T 



bk = ^j^fiOsm^dt, k=\,2,...,n 



□ □ 

EXAMPLE 1 Least Squares Approximation to a Sound Wave M 

Let a sound wave p(t) have a saw-tooth pattern with a basic frequency of 5000 cps (Figure 10.19.7). 
Assume units are chosen so that the normal atmospheric pressure is at the zero level and the 
maximum amplitude of the wave is A. The basic period of the wave is 7'= 1 / 5000 = .0002 second. 
From I — 0 to I — the function p(t ) has the equation 



Theorem 10.19.2 then yields the following (verify): 



7- 



rf^ = M ^=1,2,... 

We can now investigate how the sound wave p{t) is perceived by the human ear. We note that 
A I T = 20,000 cps, so we need only go up to ^ = 4 in the formulas above. The least squares 
approximation to pit) is then 

q{t) = — sin— ^ + — sin— ^ + — sin— ^ + — sin— ^ 

The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, respectively. In 
Figure 10.19.8 we have plotted p(t) and q(t) over one period. Although q(l.) is not a very good 
point-by-point approximation to ;?(^), to the ear, both p{t) midqil) produce the same sensation of 
sound. 




Figure 10.19.7 
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Figure 10.19.8 



As discussed in Section 6.6, the least squares approximation becomes better as the number of terms in the 
approximating trigonometric polynomial becomes larger. More precisely. 



1 " 

/ (t) — —aQ — (ajf^ cos kt + sin ^0 



tends to zero as n approaches infinity. We denote this by writing 

f (t) ^ —a^ + E (<^k cos kt + ijt sin kt) 

^ k=l 

where the right side of this equation is the Fourier series of f (t ). Whether the Fourier series of / (t ) converges to 
f (t) for each t is another question, and a more difficult one. For most continuous functions encountered in 
applications, the Fourier series does indeed converge to its corresponding function for each value of t. 



Exercise Set 10.19 

1. Find the trigonometric polynomial of order 3 that is the least squares approximation to the function 
/ (t) = - tt)^ over the interval [0, 2"] . 

Answer: 

ir2 4 

^ +4 COS ^ + cos 2t + ^cos 3t 

*\ 

2. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function f {t)=t 
over the interval [0, T]. 

Answer: 

r»2 rpl 



H ^ cos —t -f — cos —t + — cos —t + — COS —i 

3 ^2 ^ r 2^ r 32 r 42 t 

ioA^ . , 1 ™ 4fr . , 1 6fr . , 1 Stt . 
— — I sin —t '^'2 "7" + 3" "7^^ ^ 4 ~Y 



3. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function J [£) over 
the interval [0, 2r] , where 

'sin^, 0<^<flr 
0, ir<^<2ir 

Answer: 



/(0 = 



- + ^ sin ^ - ^ COS 2i: - rr|— cos At 

4. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
f (t) = sin^^ over the interval [0, 2ir] . 

Answer: 

^ ^ cos ^ — ^ ^ ^ COS 2t — ^r-^ cos 3^ — ... — tttt: -^cos«^l 



^\2 1-3 3.5 ^^^^-^ ...... ... ^2n^\){2n-^\) 

5. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
f (t) over the interval [0, T] , where 



f(t) = 



T-t, ^T<t<T 



Answer: 

;j T- — cos — + — cos — + — T-cos^=— + • • • + r- cos 



6. For the inner product 



r2v 

(u, v} = J u(t)v(t) dt 



show that 

(a) ||l|| = v^ 

(b) llcos^i^ll = foryt= 1, 2,... 

(c) \\^\^ki\\ = {i for;fc= 1, 2,... 

7. Show that the 2« + 1 functions 

1, cos t, cos 2^, cos nt, ivat, sin 2^, sin nt 

are orthogonal over the interval [0, 27r] relative to the inner product (u, vj defined in Exercise 6. 

8. If / iC) is defined and continuous on the interval [0, T] , show that / (TV / 2ts') is defined and continuous for j 
in the interval [0, 2fr] . Use this fact to show how Theorem 10.19.2 follows from Theorem 10.19.1. 

Section 10.19 Technology Exercises 

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 



Tl. Let g be the function 



for 0 < ^ < 2ir. Use a computer to determine the Fourier coefficients 

/""H = i r {^±aj^\{ cos 

\^k\ Wn I 5-4cos^ n sin^/ 



dt 



for ^ = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for ci]^ and b]^. Test your 
conjecture by calculating 

H DO 

-^ct^ + S (flk cos kt + b]^ sin kt) 

on the computer and see whether it converges to g(^) . 
T2. Let g be the function 



for 0 < i < 2flr. Use a computer to determine the Fourier coefficients 




for ^ = 0, 1, 2, 3, 4, 5. From your results, make a conjecture about the general expressions for taijc and bf^. Test your 
conjecture by calculating 

■;r(3o + 5Z (i3f jc cos + ijt sin fe) 

on the computer and see whether it converges to g(t). 
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10.20 Warps and Morphs 



Among the more interesting image-manipulation techniques available for computer graphics are warps and 
morphs. In this section we show how linear transformations can be used to distort a single picture to produce 
a warp, or to distort and blend two pictures to produce a morph. 

13 



Prerequisites 

Geometry of Linear Operators on (Section 4.11) 
Linear Independence 
Bases in ^ 



Computer graphics software enables you to manipulate an image in various ways, such as by scaling, rotating, 
or slanting the image. Distorting an image by separately moving the comers of a rectangle containing the 
image is another basic image-manipulation technique. Distorting various pieces of an image in different ways 
is a more complicated procedure that results in a warp of the picture. In addition, warping two different 
images in complementary ways and blending the warps results in a morph of the two pictures (from the Greek 
root meaning "shape" or "form"). An example is Figure 10.20.1 in which four photographs of a woman taken 
over a 50-year period (the four diagonal pictures from top left to bottom right) have been pairwise morphed 
by different amounts to suggest the gradual aging of the woman. 
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Figure 10.20.1 

The most visible application of warping and morphing images has been the production of special effects in 
motion pictures and television. However, many scientific and technological applications of such techniques 
have also arisen — for example, studying the evolution, growth, and development of living organisms, 
assisting in reconstructive and cosmetic surgery, exploring various designs of a product, and "aging" 
photographs of missing persons or police suspects. 



Warps 

We begin by describing a simple warp of a triangular region in the plane. Let the three vertices of a triangle be 
given by the three noncollinear points v^, V2, and V3 (Figure 10.20.2a). We will call this triangle the begin- 
triangle. If v is any point in the begin-triangle, then there are unique constants and C2 such that 



v-V3 = ci(vi -V3) +<:2(V2-V3) 



(1) 



Equation 1 expresses the vector v — V3 as a (unique) linear combination of the two linearly independent 
vectors vi — V3 and V2 — V3 with respect to an origin at V3. If we set ^3 = 1 —ci —C2-> then we can rewrite 1 
as 



V = civi + C2V2 + C3V3 (2) 

where 

ci +C2 + C3= 1 (3) 

from the definition of 'i'^3. We say that v is a convex combination of the vectors , V2, and V3 if 2 and 3 are 
satisfied and, in addition, the coefficients c: 1 , C2, and C3 are nonnegative. It can be shown (Exercise 6) that v 
lies in the triangle determined by vj, V2, and V3 if and only if it is a convex combination of those three 
vectors. 
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W = c -I- 12^'2 + C^Wj 

Figure 10.20.2 

Next, given three noncollinear points w^, W2, and W3 of an end-triangle (Figure 10.20.2Z?), there is a unique 
afflne transformation that maps to ^i, V2 to ^2, and V3 to W3. That is, there is a unique 2x2 invertible 
matrix M and a unique vector b such that 

Wi = Mvi + h fori=l,2, 3 (4) 

(See Exercise 5 for the evaluation of M and b.) Moreover, it can be shown (Exercise 3) that the image w of the 
vector V in 2 under this affine transformation is 



w= ciwi + C2W2 + C3W3 



(5) 



This is a basic property of affme transformations: They map a convex combination of vectors to the same 
convex combination of the images of the vectors. 

Now suppose that the begin-triangle contains a picture within it (Figure 10.20.3a). That is, to each point in the 
begin-triangle we assign a gray level, say 0 for white and 100 for black, with any other gray level lying 
between 0 and 100. In particular, let a scalar- valued function /vg? called the picture-density of the begin- 
triangle, be defined so that po(v) is the gray level at the point v in the begin-triangle. We can now define a 
picture in the end-triangle, called a warp of the original picture, with a picture-density pi by defining the gray 
level at the point w within the end-triangle to be the gray level of the point v in the begin-triangle that maps 
onto w. In equation form, the picture-density p\ is determined by 

PI (w) = /?o (c 1 VI + C2V2 + C3V3) (6) 

In this way, as c 1 , C2, and C2 vary over all nonnegative values that add to one, 5 generates all points w in the 
end-triangle, and 6 generates the gray levels p\ (w) of the warped picture at those points (Figure 10.20.36). 

-Jii^ ^ 

> =f,V, + CjVj + C,V, 




Equation 6 determines a very simple warp of a picture within a single triangle. More generally, we can break 
up a picture into many triangular regions and warp each triangular region differently. This gives us much 
freedom in designing a warp through our choice of triangular regions and how we change them. To this end, 
suppose we are given a picture contained within some rectangular region of the plane. We choose n points v^, 



V2, v,i within the rectangle, which we call vertex points, so that they fall on key elements or features of 
the picture we wish to warp (Figure 10.20.4a). Once the vertex points are chosen, we complete a 
triangulation of the rectangular region; that is, we draw line segments between the vertex points in such a 
way that we have the following conditions (Figure 10.20.4Z?): 

1. The line segments form the sides of a set of triangles. 

2. The line segments do not intersect. 

3. Each vertex point is the vertex of at least one triangle. 

4. The union of the triangles is the rectangle. 

5. The set of triangles is maximal (i.e., no more vertices can be connected). 

Note that condition 4 requires that each comer of the rectangle containing the picture be a vertex point. 
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Figure 10.20.4 

One can always fomi a triangulation from any n vertex points, but the triangulation is not necessarily unique. 



For example, Figures 10.20.46 and 10.20.4c are two different triangulations of the set of vertex points in 
Figure 10.20.4a. Since there are various computer algorithms that perform triangulations very quickly, it is 
not necessary to perform the tiresome triangulation task by hand; one need only specify the desired vertex 
points and let a computer generate a triangulation from them. If n is the number of vertex points chosen, it can 
be shown that the number of triangles m of any triangulation of those points is given by 

m = 2n — 2 — k (7) 

where k is the number of vertex points lying on the boundary of the rectangle, including the four situated at 
the corner points. 

The warp is specified by moving the n vertex points vi , V2, . . v,|^ to new locations , - - according 
to the changes we desire in the picture (Figures 10.20.5a and 10.20.56). However, we impose two restrictions 
on the movements of the vertex points: 

1. The four vertex points at the comers of the rectangle are to remain fixed, and any vertex point on a side of 
the rectangle is to remain fixed or move to another point on the same side of the rectangle. All other vertex 
points are to remain in the interior of the rectangle. 

2. The triangles determined by the triangulation are not to overlap after their vertices have been moved. 

The first restriction guarantees that the rectangular shape of the begin-picture is preserved. The second 
restriction guarantees that the displaced vertex points still form a triangulation of the rectangle and that the 
new triangulation is similar to the original one. For example. Figure 10.20.5c is not an allowable movement 
of the vertex points shown in Figure 10.20.5a. Although a violation of this condition can be handled 
mathematically without too much additional effort, the resulting warps usually produce unnatural results and 
we will not consider them here. 
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Figure 10.20.5 

Figure 10.20.6 is a warp of a photograph of a woman using a triangulation with 94 vertex points and 179 
triangles. Note that the vertex points in the begin-triangulation are chosen to lie along key features of the 
picture (hairline, eyes, lips, etc.). These vertex points were moved to final positions corresponding to those 
same features in a picture of the woman taken 20 years after the begin-picture. Thus, the warped picture 
represents the woman forced into her older shape but using her younger gray levels. 
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Figure 10.20.6 



Time-Varying Warps 

A time-varying warp is the set of warps generated when the vertex points of the begin-picture are moved 
continually in time from their original positions to specified final positions. This gives us a motion picture 
which the begin-picture is continually warped to a final warp. Let us choose time units so that ^ = 0 
corresponds to our begin-picture and t= I corresponds to our final warp. The simplest way of moving the 
vertex points from time 0 to time 1 is with constant velocity along straight-line paths from their initial 



positions to their final positions. 



To describe such a motion, let Ui(t) denote the position of the ith vertex point at any time t between 0 and 1. 
Thus Ui( 0) = Vi (its given position in the begin-picture) and Ui(l) = (its given position in the final warp). 
In between, we determine its position by 

Ui(t) = (l-t)vi + tWi (8) 

Note that 8 expresses Ui(t) as a convex combination of Vi and Wi for each t in [0, 1]. Figure 10.20.7 
illustrates a time- varying triangulation of a plain rectangular region with six vertex points. The lines 
connecting the vertex points at the different times are the space-time paths of these vertex points in this 
space-time diagram. 




Figure 10.20.7 

Once the positions of the vertex points are computed at time t, a warp is performed between the begin-picture 
and the triangulation at time t determined by the displaced vertex points at that time. Figure 10.20.8 shows a 
time-varying warp at five values of t generated from the warp between i = Q and i = ] shown in Figure 
10.20.6. 




Figure 10.20.8 



Morphs 



A time-varying morph can be described as a blending of two time-varying warps of two different pictures 
using two triangulations that match corresponding features in the two pictures. One of the two pictures is 
designated as the begin-picture and the other as the end-picture. First, a time-varying warp from ^ = Q to 
f = 1 is generated in which the begin-picture is warped into the shape of the end-picture. Then a time-varying 
warp from j = 1 to j = 0 is generated in which the end-picture is warped into the shape of the begin-picture. 
Finally, a weighted average of the gray levels of the two warps at each time t is produced to generate the 
morph of the two images at time t. 

Figure 10.20.9 shows two photographs of a woman taken 20 years apart. Below the pictures are two 
corresponding triangulations in which corresponding features of the two photographs are matched. The 
time-varying morph between these two pictures for five values of t between 0 and 1 is shown in Figure 
10.20.10. 
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Figure 10.20.9 
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Figure 10.20.10 



The procedure for producing such a morph is outlined in the following nine steps (Figure 10.20.11): 

Step 1 Given a begin-picture with picture-density .'yg and an end-picture with picture-density pi , position n 
vertex points , V2, . . in the begin-picture at key features of that picture. 

Step 2 Position n corresponding vertex points , iv2, . . w„ in the end-picture at the corresponding key 
features of that picture. 

Step 3 Triangulate the begin- and end-pictures in similar ways by drawing lines between corresponding 
vertex points in both pictures. 

Step 4 For any time t between 0 and 1, find the vertex points u.\ (t), U2(0» - ^^e morph picture at 

that time, using the formula 

Uj(0 = (1 -Ovj +^2, ? = 1, 2, « (9) 

Step 5 Triangulate the morph picture at time t similar to the begin- and end-picture triangulations. 

Step 6 For any point u in the morph picture at time t, find the triangle in the triangulation of the morph 

picture in which it lies and the vertices u/(^), u/(i'), and Ujc--(i^) of that triangle. (See Exercise 1 to 
determine whether a given point lies in a given triangle.) 

Step 7 Express w as a convex combination of iij(t), u/(/), and Uj^(0 by finding the constants c/, cj^ and 
Cf^ such that 

u = ^/uj(0 +cjnj(t) +cjcajciO (10) 

and 

ci + cj + CK=^ (11) 
Step 8 Determine the locations of the point u in the begin- and end-pictures using 

v = cprj + cjYj +cji^jC (in the begin-picture) (12) 

and 

w=cjwj + c/wj + (in the end-picture) (13) 

Step 9 Finally, determine the picture-density pf(u) of the morph-picture at the point u using 

Pf(n) = (1 - t)po(v) + tpi (w) (14) 

Step 9 is the key step in distinguishing a warp from a morph. Equation 14 takes weighted averages of the gray 
levels of the begin- and end-pictures to produce the gray levels of the morph-picture. The weights depend on 
the fraction of the distances that the vertex points have moved from their beginning positions to their ending 
positions. For example, if the vertex points have moved one-fourth of the way to their destinations (i.e., if 
t = 0.25)? then we use one-fourth of the gray levels of the end-picture and three-fourths of the gray levels of 



the begin-picture. Thus, as time progresses, not only does the shape of the begin-picture gradually change into 
the shape of the end-picture (as in a warp) but the gray levels of the begin-picture also gradually change into 
the gray levels of the end-picture. 

Time = 1 
End- picture 
Given density: p,(w) 



Time = t 
Mi)rph-picture 
Computed density: 
/j^ii) = (l - OPo(v) + rpi(w) 



Time = 0 
Begin-picture 
Given density: pjj(v) 



Figure 10.20.11 

The procedure described above to generate a morph is cumbersome to perform by hand, but it is the kind of 
dull, repetitive procedure at which computers excel. A successful morph demands good preparation and 
requires more artistic ability than mathematical ability. (The software designer is required to have the 
mathematical ability.) The two photographs to be morphed should be carefully chosen so that they have 
matching features, and the vertex points in the two photographs also should be carefully chosen so that the 
triangles in the two resulting triangulations contain similar features of the two pictures. When the procedure is 
done correctly, each frame of the morph should look just as "real" as the begin- and end-pictures. 

The techniques we have discussed in this section can be generalized in numerous ways to produce much more 
elaborate warps and morphs. For example: 

1. If the pictures are in color, the three components of the picture colors (red, green, and blue) can be 
morphed separately to produce a color morph. 

2. Rather than following straight-line paths to their destinations, the vertices of a triangulation can be directed 
separately along more complicated paths to produce a variety of results. 

3. Rather than travel with constant speeds along their paths, the vertices of a triangulation can be directed to 
have different speeds at different times. For example, in a morph between two faces, the hairline can be 
made to change first, then the nose, and so forth. 

4. Similarly, the gray-level mixing of the begin-picture and end-picture at different times and different 
vertices can be varied in a more complicated way than that in Equation 14. 

5. One can morph two surfaces in three-dimensional space (representing two complete heads, for example) 
by triangulating the surfaces and using the techniques in this section. 




6. One can morph two solids in three-dimensional space (for example, two three-dimensional tomographs of 
a beating human heart at two different times) by dividing the two solids into corresponding tetrahedral 
regions. 

7. Two film strips can be morphed frame by frame by different amounts between each pair of frames to 
produce a morphed film strip in which, say, an actor walking along a set is gradually morphed into an ape 
walking along the set. 

8. Instead of using straight lines to triangulate two pictures to be morphed, more complicated curves, such as 
spline curves, can be matched between the two pictures. 

9. Three or more pictures can be morphed together by generalizing the formulas given in this section. 

These and other generalizations have made warping and morphing two of the most active areas in computer 
graphics. 



Exercise Set 10.20 



1. Determine whether the vector v is a convex combination of the vectors vi , V2, and V3. Do this by solving 
Equations 1 and 3 for ci, ^2, and and ascertaining whether these coefficients are nonnegative. 



(a) v = 



(b) „ = 



V = 



VI = 



v = 



(d)v = 
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VI = 
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[=]■•-[: 



} -[3 
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Answer: 



(a) Yes; v = ^vi + |v2 + |v3 

(b) No; V = jvi + jV2 - jV3 

(c) Yes; V = jvi + •|v2 + 0v3 

(d) Yes; v = ^vj + ^V2 + ^V3 

2. Verify Equation 7 for the two triangulations given in Figure 10.20.4. 



Answer: 



f^z=z number of triangles =l,n= number of vertex points =7, it= number of boundary vertex points 
= 5; Equation 7) is 7 = 2(7) — 2 — 5. 

3. Let an affine transformation be given by a 2 x 2 matrix M and a two-dimensional vector b. Let 

v=civi } C2V2 + C3V3, where I C2 I ^73 = 1; let w= il/v | b; and let = Mvj } bforz = l,2, 3. 
Show that w= ciwi + C2W2 I C3W3. (This shows that an affine transformation maps a convex 
combination of vectors to the same convex combination of the images of the vectors.) 



Answer: 



w= Mv + h = M(c\v\ +C2V2 + €2^3) + (c\ +C2 + C2)h 
==ci(My\ +b) +C2(Mv2 + b) +C2(Mv2 +b) =ciwi +«:2W2 + 

^* (a) Exhibit a triangulation of the points in Figure 10.20.4 in which the points V3, V5, and form the 
vertices of a single triangle. 

(b) Exhibit a triangulation of the points in Figure 10.20.4 in which the points V2, and vy do not form 
the vertices of a single triangle. 




5. Find the 2x2 matrix M and two-dimensional vector b that define the affine transformation that maps the 
three vectors , V2, and V3 to the three vectors , \V2, and W3. Do this by setting up a system of six 
linear equations for the four entries of the matrix M and the two entries of the vector b. 
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Answer: 



• (a) Let a and b be linearly independent vectors in the plane. Show that '\ic\ and C2 are nonnegative 

numbers such that ci + 1:2 = 1? then the vector cia + C2b lies on the line segment connecting the tips 
of the vectors a and b. 

(b) Let a and b be linearly independent vectors in the plane. Show that iic\ and ^2 are nonnegative 
numbers such that ^ | .:-2 1 , then the vector I Cvb lies in the triangle connecting the origin 
and the tips of the vectors a and b. [Hint: First examine the vector cja + C2b multiplied by the scale 
factor 1 / (c\ I €2)'^ 

(c) Let vi , V2, and V3 be noncollinear points in the plane. Show that if 1 , ^"2? and are nonnegative 
numbers such that + ^2 + C3 = 1, then the vector c\y\ + C2V2 + C3V3 lies in the triangle 
connecting the tips of the three vectors. [Hint: Let a = vi — V3 and b = V2 — V3, and then use 
Equation 1 and part (b) of this exercise.] 

• (a) What can you say about the coefficients c^l, ^2? and C3 that determine a convex combination 

V = civi + C2V2 + C3V3 if V lies on one of the three vertices of the triangle determined by the three 
vectors vi, V2, and V3? 

(b) What can you say about the coefficients ^1,^2? and that determine a convex combination 

V = c 1 vi I C2V2 f C3V3 if V lies on one of the three sides of the triangle determined by the three 
vectors vi, vv, and V3? 

(c) What can you say about the coefficients c \,C2^ and that determine a convex combination 

V = c^vi + C2V2 + C3V3 if V lies in the interior of the triangle determined by the three vectors vi, V2, 
and V3? 

Answer: 

(a) Two of the coefficients are zero. 

(b) At least one of the coefficients is zero. 

(c) None of the coefficients are zero. 

• (a) The centroid of a triangle lies on the line segment connecting any one of the three vertices of the 

triangle with the midpoint of the opposite side. Its location on this line segment is two-thirds of the 
distance from the vertex. If the three vertices are given by the vectors vi, V2, and V3, write the 
centroid as a convex combination of these three vectors. 

(b) Use your result in part (a) to find the vector defining the centroid of the triangle with the three vertices 

[21 [51 .rii 



Answer: 



(^) yvi + + jV3 

(b) [8/3 
2 



Section 10.20 Technology Exercises 



The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 



Tl. To warp or morph a surface in p} we must be able to triangulate the surface. Let v\ = 
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be three noncoUinear vectors on the surface. Then a vector v = 
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V33 




V3 



Hes in the 



triangle formed by these three vectors if and only if v is a convex combination of the three vectors; that is, 
v=civi I C2V2 I c 3 V3 for some nonnegative coefficients c 2, and c 3 whose sum is 1. 

(a) Show that in this case, c\,C2, and are solutions of the following linear system: 



vii V21 V31 

V12 V22 V32 

V13 V23 V33 

1 1 1 
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In parts (b)-(d) determine whether the vector v is a convex combination of the vectors vj = 
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T2. To warp or morph a solid object in /J^ flj-st partition the object into disjoint tetrahedrons. Let 

be four noncoplanar vectors. Then a vector 
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lies in the solid tetrahedron formed by these four vectors if and only if v is a convex combination of 



the three vectors; that is, v = Civi + C2V2 + C3V3 + C4V4 for some nonnegative coefficients ci,C2, c^, and 
C4 whose sum is one. 

(a) Show that in this case, c 1 , 1^2? "3? and C4 are solutions of the following linear system: 
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In parts (b)-(d) determine whether the vector v is a convex combination of the vectors vi = 
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A 



How to Read Theorems 



Since many of the most important concepts in linear algebra occur as theorem statements, it is important to be 
familiar with the various ways in which theorems can be structured. This appendix will help you to do that. 



Contrapositive Form of a Theorem 

The simplest theorems are of the form 

If His true, then C is true, (1) 

where //is a statement, called the hypothesis, and C is a statement, called the conclusion. The theorem is true 
if the conclusion is true whenever the hypothesis is true, and the theorem is false if there is some case where 
the hypothesis is true but the conclusion is false. It is common to denote a theorem of form 1 as 

H=^C (2) 

(read, "//implies C"). As an example, the theorem 

If a and b are both positive numbers, then ab is a positive number. (3) 

is of form 2, where 

H = a and b are both positive numbers (4) 

C = ^3i is a positive number (5) 

Sometimes it is desirable to phrase theorems in a negative way. For example, the theorem in 3 can be 
rephrased equivalently as 

If ab is not a positive number, then a and b are not both positive numbers. (6) 

If we write to mean that 4 is false and r^. C to mean that 5 is false, then the structure of the theorem in 6 
is 



(7) 



In general, any theorem of form 2 can be rephrased in form 7, which is called the contrapositive of 2. If a 
theorem is true, then so is its contrapositive, and vice versa. 



Converse of a Theorem 

The converse of a theorem is the statement that results when the hypothesis and conclusion are interchanged. 
Thus, the converse of the theorem // C is the statement C => H- Whereas the contrapositive of a true 
theorem must itself be a true theorem, the converse of a true theorem may or may not be true. For example, 
the converse of 3 is the false statement 

If ab is a positive number, then a and b are both positive numbers. 

but the converse of the true theorem 

If a >b, then 2a >2b . (8) 

is the true theorem 

If 2a > 2b, then a > b . (9) 



Equivalent Statements 

If a theorem H ^i^d its converse C ^ H both true, then we say that H and C are equivalent 
statements, which we denote by writing 

//4»C (10) 

(read, "//and C are equivalent"). There are various ways of phrasing equivalent statements as a single 
theorem. Here are three ways in which 8 and 9 can be combined into a single theorem. 



r 



Form 1 

If a > 65 then 2a>2b^ conversely, if 2a > 2b ^ then a>b- 



L 



r 



Form 2 

a>b if and only if 2a > 2b' 



L 



r 



Form 3 

The following statements are equivalent. 

(i) <^>i> 

(ii) 2a>2b 



Theorems Involving Three or More Statements 

Sometimes two true theorems will give you a third true theorem for free. Specifically, if // C is a true 
theorem, and C => Z) is a true theorem, then ^ > £) must also be a true theorem. For example, the theorems 
If opposite sides of a quadrilateral are parallel, then the quadrilateral isaparallelo gram. 

and 

Opposite sides of a parallelogram have equallengths. 

imply the third theorem 

If opposite sides of a quadrilateral are parallel, then they have equal lengths. 

Sometimes three theorems yield equivalent statements for free. For example, if 

//=^C, C=>D, D=>H (11) 
then we have the implication loop in Figure A. 1 from which we can conclude that 

C=>H, D=>C, H=>D (12) 

Combining this with 1 1 we obtain 

//<=>C, C<»A D<^H (13) 



In summary, if you want to prove the three equivalences in 13, you need only prove the three implications in 
11. 



H 



Figure A.l 
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I APPENDIX I 



R Complex Numbers 



Complex numbers arise naturally in the course of solving polynomial equations. For example, the solutions of 
the quadratic equation ax^ + bx c = 0, which are given by the quadratic formula 

are complex numbers if the expression inside the radical is negative. In this appendix we will review some of 
the basic ideas about complex numbers that are used in this text. 



Complex Numbers 

To deal with the problem that the equation ^ ^ \ has no real solutions, mathematicians of the eighteenth 
century invented the "imaginary" number 

which is assumed to have the property 

but which otherwise has the algebraic properties of a real number. An expression of the form 

a I bi or a | ib 

in which a and b are real numbers is called a complex number. Sometimes it will be convenient to use a 
single letter, typically z, to denote a complex number, in which case we write 

z = a + bi or z = a + ib 
The number a is called the real part of z and is denoted by Re(z) , and the number b is called the imaginary 
part of z and is denoted by Im(z) . Thus, 

Re(3 + 20 = 3, Im(3 + 2i) = 2 

Re(l - 50 = 1, Itn(l - 50 = Im(l + ( - 5)0 = - 5 

Re(70=Re(0 + 70 = 0, Im(70=7 

Re (4) = 4, Im(4) = Im(4 + Oi) = 0 

Two complex numbers are considered equal if and only if their real parts are equal and their imaginary parts 
are equal; that is, 

a+bi = c + di if and only if a =c and b = d 

A complex number ^ = bi whose real part is zero is said to be pure imaginary. A complex number z = a 
whose imaginary part is zero is a real number, so the real numbers can be viewed as a subset of the complex 



numbers. 



Complex numbers are added, subtracted, and multiplied in accordance with the standard rules of algebra but 
withj2 = « 1: 

(a + bi) + (c+di) = (a + c) + {b + d)i (1) 
(a + bi) - (c + di) = (a-c) + (b- d)i (2) 

{a + bi) {c + di) = (ac - bd) + (ad + bc)i (3) 

The multiplication formula is obtained by expanding the left side and using the fact that _ _ | . Also note 
that if i = 0, then the multiplication formula simplifies to 

a(c + di) =ac + adi (4) 

The set of complex numbers with these operations is commonly denoted by the symbol C and is called the 
complex number system. 

EXAMPLE 1 Multiplying Complex Numbers A 

As a practical matter, it is usually more convenient to compute products of complex numbers by 
expansion, rather than substituting in 3. For example, 

(3 _ 20(4 + 50 = 12 + \5i - 8i - lOi^ = (12 + 10) +7i = 22 + 7i 



The Complex Plane 

A complex number z = a \ bi can be associated with the ordered pair (a, b) of real numbers and represented 
geometrically by a point or a vector in the xy-plane (Figure B.l). We call this the complex plane. Points on 
the X-axis have an imaginary part of zero and hence correspond to real numbers, whereas points on the y-axis 
have a real part of zero and correspond to pure imaginary numbers. Accordingly, we call the x-axis the real 
axis and thej^-axis the imaginary axis (Figure B.2). 




Figure B.l 



(Imaginary b 
part of z) 



A I magi nary axis 



1 

I Real axis 
► 



(Real part of z) 
Figure B.2 



Complex numbers can be added, subtracted, or multiplied by real numbers geometrically by performing these 
operations on their associated vectors (Figure B.3, for example). In this sense the complex number system C 
is closely related to 5;^, the main difference being that complex numbers can be multiplied to produce other 
complex numbers, whereas there is no multiplication operation on that produces other vectors in (the 
dot product produces a scalar, not a vector in ^^). 



Zl"Z2 





The sum of two complex 
numbers 



The difference of two 
complex numbers 



Figure B.3 

If ? = (3 + is a complex number, then the complex conjugate of z, or more simply, the conjugate of z, is 
denoted by z (read, "z bar") and is defined by 



(5) 



Numerically, z is obtained from z by reversing the sign of the imaginary part, and geometrically it is obtained 
by refiecting the vector for z about the real axis (Figure B.4). 



4^ 




Figure B.4 

EXAMPLE 2 Some Complex Conjugates M 

z = 3 + Ai 1=3 -Ai 

z= — 2 — 5i z = — 2 + 5i 

z = i J= — j 

z=7 J=7 



Remark The last computation in this example illustrates the fact that a real number is equal to its complex 
conjugate. More generally, z = z if and only if z is a real number. 



The following computation shows that the product of a complex number z = a + bi and its conjugate 
^ = ^ — ^2 is a nonnegative real number: 



(6) 



You will recognize that 



^= {a 



is the length of the vector corresponding to z (Figure B.5); we call this length the modulus (or absolute value 
of z) and denote it by |z|. Thus, 



(7) 



Note that if = 0, then ^ = is a real number and 









z 




a 



, which tells us that the modulus of a real 



number is the same as its absolute value as defined in beginning algebra. 



Figure B.5 



EXAMPLES Some Modulus Computations A 



= 3 + 4i |z| = 1/32 + 42 = 5 
= _4-5i |z| = /(_4)2 + (-5)2 = 1/41 
|z| = /o2+l2 = l 



z = j 



Reciprocals and Division 

If ? ^ 0^ then the reciprocal (or multiplicative inverse) of z is denoted by ] /z (or and is defined by tlie 
property 

(^>- 

Tliis equation has a unique solution for \ fz, which we can obtain by multiplying both sides by z and using 
the fact that zz = |z|2 [see 7]. This yields 

1 _ z 

7--^ (8) 

If ?2 ^ 0? then the quotient z\ I is defined to be the product of and \ i 22- This yields the formula 

-2-|,3|2^'-^,|2 (9) 

Observe that the expression on the right side of 9 results if the numerator and denominator of 1 22 are 
multiplied by ^2. As a practical matter, this is often the best way to perform divisions of complex numbers. 



EXAMPLE 4 Division of Complex Numbers M 



Let z\ = 3 + 4i and Z2 = ^ — 2i. Express z\ I in the form a \- hi- 

Solution We will multiply the numerator and denominator of / Z2^y^2- This yields 

£L _ Z]zy. _ 3 -I- 4? \ \ 2i 
Z2 1-2! 1 + 2! 

_ 3 + 6? + 4? + 8?^ 

1 -4r 

-5 + 10? 
5 

= - 1 + 2i 



The following theorems list some useful properties of the modulus and conjugate operations. 

n 

THEOREM B.1 

The following results hold for any complex numbers z, zi, and?2- 

(a) ?1 +Z2=zi +Z2 

(b) ?1 -Z2=zi -Z2 

(c) W2=^\^2 

(d) z\ /Z2=z\ Iz2 
(e) 

IJ 



THEOREM B.2 

The following results hold for any complex numbers z, z\, and?2- 

(a) ¥\ = n 

(b) 1^1221 = h 11^21 

(c) |zi/^2| = |^l|/|^2| 

(d) h+^2|< 1^11+1^2! 



Polar Form of a Complex Number 



If ? = <3 + is a nonzero complex number, and if o is an angle from the real axis to the vector z, then, as 
suggested in Figure B.6, the real and imaginary parts of z can be expressed as 

i3=|z|cos^ and i = |z|sin^ (10) 

Thus, the complex number z = a + bi can be expressed as 

z= |z|(cos ^ + i sin^) (11) 

which is called a polar form of z. The angle cp in this formula is called an argument of z. The argument of z is 
not unique because we can add or subtract any multiple of 2.t to it to obtain a different argument of z. 
However, there is only one argument whose radian measure satisfies 

-fl'<^<fl- (12) 

This is called the principal argument of z. 

b - \z\ Mn </> 
► 

a = |c| cos 4) 
Figure B.6 




EXAMPLE 5 Polar Form of a Complex Number A 

Express ^ = 1 — in polar form using the principal argument. 
Solution The modulus of z is 

Thus, it follows from 10 with a = \ ^nd i = — ^ that 

l=2cos^ and — ^=2sin^ 

and this implies that 

cos ^ = '2 sin ^ = — 

The unique angle tp that satisfies these equations and whose radian measure satisfies 12 is 
^ = — # / 3 (Figure B.7). Thus, a polar form of z is 

z=2(cos(-|) + Jsin(-|)) = 2(cos|-ism|) 




Geometric Interpretation of IVIultiplication and Division of Complex 
Numbers 

We now show how polar forms of complex numbers provide geometric interpretations of multiplication and 
division. Let 

zi = |zi|(cos ^1 +i sin^i) and Z2 = |z2|(cos ^2 +^ sin ^2) 
be polar forms of the nonzero complex numbers ?i and ^2- Multiplying, we obtain 

z\Z2 = |^l||^2|[(cos 91C0S <P2 — sm^ism^2) + j(sm^icos ^2 + cos ^isin^2)] 
Now applying the trigonometric identities 

cos (^1 + ^2) = cos ^1 cos ^2 ~ sin ^1 sin ^2 
sin(^l +^2) = sin ^1 cos ^2 + cos ^isin^2 

yields 

ZIZ2 = 1^1 1 1^2 1 [ cos (^1 +^2) +i sin(^i +^2)] (13) 

which is a polar form of the complex number with modulus ||^2| ^nd argument + ^2- Thus, we have 
shown that multiplying two complex numbers has the geometric effect of multiplying their moduli and adding 
their arguments (Figure B.8). 




Figure B.8 



Similar kinds of computations show that 

^ = ^[^^^(^1 -^2) + j sinC^l -^2)] (14) 

which tells us that dividing complex numbers has the geometric effect of dividing their moduli and subtracting 
their arguments (both in the appropriate order). 



EXAMPLE 6 Multiplying and Dividing in Polar Form A 

Use polar forms of the complex numbers = 1 + and ?2 = ^3 + i to compute z\Z2 and 

Solution Polar forms of these complex numbers are 

?1 = 2^cosy + i sin-jj and Z2 = 2^cos^ + j sin^J 

(verify). Thus, it follows from 13 that 

.,Z2 = 4[cos(| + |) + .si„(f + |)]=4[co.(|)+is„(f)]=4i 

and from 14 that 

|=l.[co.(f-|)+*«(f-|)]=co.(|) + ...(|) = f 4< 

As a check, let us calculate Z1Z2 and zi I Z2 directly: 
ZIZ2 = (1 + {31) (/3 + 0 = 1/3 +i + 3i + {^^=4i 

zi _ 1 + /J; _ 1 + /si /3-i _ ^-i + 3i-^i^ _ 2^3 + 2i ^ \f3 \ 
Z2 ^ + i ' {l-i 3_j2 4 2 2' 

which agrees with the results obtained using polar forms. 



Remark The complex number i has a modulus of 1 and a principal argument of .77 / 2- Thus, if z is a complex 
number, then has the same modulus as z but its argument is greater by / 2 ( = 90°) ; that is, multiplication 
by i has the geometric effect of rotating the vector z counterclockwise by 90° (Figure B.9). 



Figure B.9 



DeMoivre's Formula 

If w is a positive integer, and if z is a nonzero complex number with polar form 

z= |z|(cos (p-\-i sm<p) 

then raising z to the nth power yields 

?"=z-z z= |z|"[cos(^ + ^+ • • • +^)] +i[sin(^ + ^+ • • • + 

It Actors n terms n terms 

which we can write more succinctly as 

= |z|"(cos «^ + i sin (15) 

In the special case where |^| = ^ this formula simplifies to 

= cos nn 1 i sin «^ 

which, using the polar form for z, becomes 

(cos ^ + 2 sin^)" = cos + i sin«^ (16) 
This result is called DeMoivre's formula. 



Euler's Formula 

If 0 is a real number, say the radian measure of some angle, then the complex exponential function is 
defined to be 

&''* = costf + isintf (17) 

which is sometimes called Euler's formula. One motivation for this formula comes from the Maclaurin series 
in calculus. Readers who have studied infinite series in calculus can deduce 17 by formally substituting lO for 
X in the Maclaurin series for and writing 

y|2 y|3 y|4 y|5 y|6 

= '+*-2r-'3r+4r+'5r-6r+ - 

= ('-F+!r-|f+---)+f-#+3f----) 

= COS 0 + j sin 



where the last step follows from the Maclaurin series for cos 0 and sin 0- 

If ^ = (3 + ii is any complex number, then the complex exponential is defined to be 

^ ^a+bi ^ ^a^b ^ ^a^^^^ b+ismb) (18) 

It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, for example. 
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Answer to Exercises 



Exercise Set 1.1 

1. (a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 
3. (a) and (d) are linear systems; (b) and (c) are not linear systems 
5. (a) and (d) are both consistent 

7. (a), (d), and (e) are solutions; (b) and (c) are not solutions 



9. 



r * 7 



y = t 
^1 

^3 
X4 



11. 



a. 2x1 

3:^1 - 4x2 

b. 3x1 

7x1 



f 



c. 7x1 

^1 

d. ^1 



^2 
-2x2 
2x2 
2x2 



0 

0 
1 

I 



^2 



^3 



X4 = 



2x3 
4x3 

^3 
X3 - 

4X3 
7 

-2 

3 
4 



= 5 
= -3 

= 7 
3x4 - 



13. 



d. 



-2 6 

3 8 

9 -3 

6 -1 

0 5 ■ 

0 2 

-3 -1 



6 

[1 0 



3 4 
-1 1_ 

0 -3 

1 0 
2-1 2 

0 0-17] 



1 
0 
-3 



0 
-1 

6 



True/False 1.1 

(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) False 

(g) True 

(h) False 

Exercise Set 1 .2 



1. 



a. Both 

b. Both 

c. Both 

d. Both 

e. Both 

f. Both 

g. Row echelon 

a. ^1= -37, X2= -8, X3 = 5 

b. XI = 13^ - 10, X2 = 13^ - 5, X3 = -t + 2, X4 = ^ 

c. xi = — 7s-f 2^ — 11, X2 = 5, X3 = —3^ — 4, X4= —3^ + 9, x^ = t 



d. Inconsistent 

5. = 3, X2 = \, X2 = 2 

7. 7: = ^ — 1, y = 2s, z = s, w = ^ 

9. ^1 = 3, ^2 = 1, 7:3 = 2 
11. 7i=t^\, y = 2s, z = s, w = t 
13. Has nontrivial solutions 
15. Has nontrivial solutions 
17. ^1 = 0, X2 = ^, 7:3 = 0 

x\ = —s, X2 = —t = s, X2 = 4s, X4 = t 
21. Mf = t, x= -t, y = t, z = 0 
23. /l=-l, /2 = 0, /3=1, Ia = 2 

25. If £j = 4, there are infinitely many solutions; ifa= — 4? there are no solutions; ifa^ ±4? there is exactly one solution. 
27. If j2 = 3? there are infinitely many solutions; ifa= —3-> there are no solutions; if ^ ?t i 3? there is exactly one solution. 

3 y 3*9 

[; ']and[; ;]arepossibleanswers. 

35. x= ±\, y= z= ±^ 

37^ a = l, i = - 6, c-2, d = \0 

39. The nonhomogeneous system will have exactly one solution. 

True/False 1 .2 

(a) True 

(b) False 

(c) False 

(d) True 

(e) True 

(f) False 

(g) True 

(h) False 

(i) False 

Exercise Set 1.3 

^' a. Undefined 

b. 4x2 

c. Undefined 

d. Undefined 

e. 5x5 

f. 5x2 

g. Undefined 

h. 5x2 



d. 



7 
-2 

7 
-5 

0 
-1 
15 
-5 

5 



6 5 
1 3 
3 7 
4 



-1 



-1 -1 
1 1 

0 

10 
5 



-7 -28 



f. 



-14 



__21 -7 -35 

e. Undefined 



22 -6 8 
-2 4 6 
10 0 4 

-39 -21 -24 
9 -6 -15 
-33 -12 -30 



11. 



13. 



h. ro 0' 

0 0 

0 0 

i. 5 
j. 

k. 168 
1. Undefined 



a. 


" 12 


-3" 






-4 


5 






4 


1 




b. Undefined 




c. 


42 


108 


75" 




12 


-3 


21 




36 


78 


63 


d. 


' 3 


45 


9' 




11 


-11 


17 




7 


17 


13 


e. 


' 3 


45 


9' 




11 


-11 


17 




7 


17 


13 



h. 



21 17 
17 35 
0 -2 



11 



1 

6 9 

-20 14 

8 16 



24 
i. 61 
j. 35 
k. 28 
1. 99 

a. [67 41 41] 

b. [63 67 57] 
[41' 

21 

67 

6" 
6 
63 

[24 56 97] 

761 



a. 


-3 




3 






-2 




12 






3 




-2 




7 




76 




3 




-2 




7 




48 


= 3 


6 


+ 6 




5 




29 




-2 


6 


+ 5 


5 


+ 4 


4 




98 


= 7 


6 


+ 4 


5 


+ 9 


4 




24 




0 






4 




56 






0 




4 




9 




97 




0 




4 




9 


b. 


64 




6 




4 




14 






6 




-2 




4 




38 




6 




-2 




4 






21 


= 6 


0 


+ 7 


3 




22 




-2 


0 




1 


-f 7 


3 




18 


= 4 


0 


+ 3 


1 


+ 5 


3 






77 




7 




5 




28 






7 




7 




5 




74 




7 




7 




5 





2 -3 5 

9 -1 1 
1 5 4 

4 0 

5 1 









7" 








-1 




^3 




0 



2 -5 



3 1 
0 -8 
9 -1 



0 



3 -1 













3 






0 


X4 




2 



a. 5x1 t 67:2 - 7:^3 = 2 
— ^1 — 2^2 I 3x2 = ^ 
4x2 - X2 = 3 



b. + X2 + X3 = 2 



15. -1 



b = 


-6, 


c = 


-1, d = 


1 


CL 1 1 


0 


0 


U 


U 


0 


0 




0 


U 


u 


0 


0 


0 


(3 33 


0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 


^55 


0 


0 


0 


0 


0 


0 


^66 


a\\ 


^5tl2 


ai2 


■^14 




^16 


0 


^^22 


^23 


^^24 


^^25 


^26 


0 


0 


•^33 






^36 


0 


0 


0 


«44 




^46 


0 


0 


0 


0 




^56 


0 


0 


0 


0 


0 


^66 


Ct 1 1 


0 


0 


n 
u 


n 
u 


0 






0 


U 


U 


0 


(331 


<^32 


a 22 


0 


0 


0 


«41 


^42 


a42 




0 


0 




^52 


^53 






0 




^62 


^63 


^64 








^12 


0 


0 


0 


0 


^21 


^22 


^23 


0 


0 


0 


0 


^32 


a33 


a34 


0 


0 


0 


0 


a43 


«44 


^45 


0 


0 


0 


0 


^54 


^55 


^56 


0 


0 


0 


0 















2x{ 4- 3x2 



2 






'-(i)=(-°) 



/(3 



27. 



One; namely, A - 



1 1 0 
1 -1 0 
0 0 0 



29. 



Four; 



True/False 1.3 

(a) True 

(b) False 

(c) False 

(d) False 

(e) True 

(f) False 

(g) False 

(h) True 

(i) True 
(j) True 
(k) True 
(1) False 
(m) True 
(n) True 
(o) False 
Exercise Set 1 .4 



r/5 ol 0 




"/5 0 " 




'-{5 0 " 


[o 3} [ 0 3_ 








0 -3_ 



5. 



1 J_ 

5 20 

_1 J_ 

5 10 



15. 



.4 = 



1 1 
7 7 



17. 



19. 



_2_ J_ 

"13 13 

13 13 



41 15 
30 11 

11 -15 
-30 41 
'6 2] 
4 2J 

1 1 

2 -1 
'20 7 

14 6 



f. [39 13' 
[26 13_ 



21. 



0 



27. 



1 


0 


an 


0 


1 




0 


0 



27 0 0 
0 26 -18 
0 18 26 

0 0.026 0.018 
0 -0.018 0.026 

4 0 0 

0 -5 -12 

0 12 -5 

1 0 0 
0-3 3 
0 -3 -3 
16 0 0 

0 -14 -15 
0 15 -14 
25 0 0" 

0 32 -24 

0 24 32 



33. B~ 
35. 



37. 



1 


1 


1 


2 


2 


2 


1 


1 


1 


2 


2 


2 


1 


1 


1 


2 


2 


2 


1 


1 


r 


2 


2 


"2 


1 


1 


1 


2 


2 


2 


1 


0 


0 



39. 



41. 



^1 = 



J_ 
23' 



^2 = 



23 



1 



True/False 1 .4 

(a) False 

(b) False 

(c) False 

(d) False 

(e) False 

(f) True 

(g) True 

(h) True 

(i) False 
(j) True 
(k) False 

Exercise Set 1.5 
1. 



a. Elementary 

b. Not elementary 

c. Not elementary 



d. Not elementary 



Add 3 times row 2 to row 1 : 



1 3 
0 1 



Multiply row Iby ——: 



Add 5 times row 1 to row 3: 



0 0 

0 1 0 
0 0 1 
1 0 0 
0 1 0 
5 0 1 



d. 



Swap rows 1 and 3: 



0 0 10 
0 10 0 
10 0 0 
0 0 0 1 



Swap rows 1 and 2: EA = 



-1 -2 



Add _3 times row 2 to row 3: EA = 



Add 4 times row 3 to row 1 : EA = 



0 0 1 

0 1 0 

1 0 0 
0 0 1 

0 1 0 

1 0 0 
10 0 
0 1 0 

-2 0 1 
10 0 
0 1 0 

2 0 1 



=6 


-6" 








5 


-1_ 








2 


-1 


0 


-4 


-4" 


1 


-3 


-1 


5 


3 


-1 


9 


4 


-12 


-10 



13 28 

2 5 

3 6 



11. 



-7 4 
2 -1 

2 3 
7 7 

3 i 
7 7 



13. 


3 


11 






2 


10 






-1 


1 






1 


7 






2 


10 




15. No inverse 




17. 


1 


1 


1 




2 


2 


2 




1 


1 


1 




2 


2 


2 




1 


1 


1 




2 


2 


2 


19. 


7 


0 - 


-3 




2 








-1 


1 


0 




0 


-1 


1 



21. 



23. 



25. 



1 

4 


1 

2 


-3 


0 


1 


1 


D 


0 


8 


4 


2 


0 


0 


1 

2 


0 


1 


1 


1 


1 


40 

- 


20 


10 


5 


J_ 


J_ 


5 


1 


12 


24 


8 


4 


5 


5 


1 


1 


6 


12 


4 


~2 


5_ 


J_ 


5 


_1 






o 
o 


4 


1 


1 


1 


1 


12 


24 


8 


4 


a. 




U 


u 




0 ^ 








0 0 


1 


0 




0 0 


0 


1 

^4 


b. 


1 1 

k k 


0 


0 




0 1 


0 


0 




0 0 


1 

k 


1 

k 




0 0 


0 


1 



27. c ?t 0, 1 
-3 1 

2 2 
1 0 - 

0 4 

0 0 



29. 



31. 



33. 



35. 



"1 


0" 


"1 r 




'-4 


0" 




0 






0 


2_ 


0 1 




0 


1 


[! 


1 


] 






'1 


0 -2 




'1 


0 


0 


"1 


0 


0" 




0 


1 0 




0 


1 


3 


0 


4 


0 




0 


0 1 




0 


0 


1 


0 


0 


1 



1 0" 


4^ 


'1 -1" 


-1 1_ 


0 1 


0 1_ 



0 

i 

4 

0 0 



0 I 





'l 


0 


o" 




0 


1 


0 




4 




0 


0 


1 



[1 


0 


0' 


"1 


0 


2 


0 


1 


-3 


0 


1 


0 


|_0 


0 


1 


0 


0 


1 



37. Add _ 1 times the first row to the second row. Add _ 1 times the first row to the third row. Add _ 1 times the second row to the first row. Add the second row to 
the third row. 

True/False 1.5 

(a) False 

(b) True 

(c) True 

(d) True 

(e) True 

(f) True 

(g) False 

Exercise Set 1.6 
1. ^1=3, X2 = -1 
3. ^1 = - ^> ^2 = 4, 7:2= -7 
5. :t = l, 7 = 5, z= -1 
7, x\ = 2b\— 5b2, ^2 = — ^1 + 3^2 



9. 



17' 



XX -- 



^2 = ^ 

li 
17 



^2 = 



11. 



J_ 
15' 

34 



15 
28 



^1 = 



1 



^2 = 



13. No conditions on and b2 
15. ^3 = ^1 -^2 



19. 



11 12 
-6 -8 
-15 -21 



-3 27 26 
1 -18 -17 

9 -38 -35 



True/False 1.6 

(a) True 

(b) True 

(c) True 

(d) True 

(e) True 

(f) True 

(g) True 

Exercise Set 1 .7 



1. 



i 0 



-1 

0 
0 

6 



1 
"5 

0 0 

1 n 



2 
0 
3 

4 -1 
4 10 
-15 
2 -10 
18 -6 



10 



20 -20 
0 6 



11. 



0 



-6 -6 -6 



16 



"1 0" 














"4 0 0" 


1-2= 


0 9 0 




0 0 16 



1 



0 



0 l/(-2)' 



0 3^ C 
0 0 4^ 



13. Not symmetric 
15. Symmetric 
17. Not symmetric 
19. Not symmetric 
21. Not invertible 
23. = - 8 
25. -2,4 



27. 



35. 



1 0 0 
0-1 0 
0 0-1 

a. Yes 

b. No (unless « 

c. Yes 

d. No (unless n = \) 



1) 



39. 



43. 



0 0 -8 
0 0-4 

8 4 0 
1 10 
0 -2 



A = 



True/False 1.7 

(a) True 

(b) False 

(c) False 

(d) True 

(e) True 

(f) False 

(g) False 

(h) True 

(i) True 
(j) False 
(k) False 
(1) False 
(m) True 

Exercise Set 1.8 
1. 




3- a. ^3-^4= -500, -xi +a:4= 100, XI -7:2 = 300, 7:2- :i:3= 100 

b. 3:1 = - 100 ^, X2 = - 400 + 7:3 = - 500 + ^, X4 = t 

c. For all rates to be nonnegative, we need i = 500 cars per hour, so a'i = 400, X2 = 100, 7:3 = 0, 7:4 = 500 

7. 7,_7^_;._7._i. 



^A, /2=-|a, /3 = ^A 



Il=l4 = l5 = h = j^, I2 = I3 = 0A 
9. TTi = 1, 7:2 = 5, 7:3 = 3, and 7:4 = 4; the balanced equation is C3H2 + 5O2 3C02 + 4H2O 
11. j:i = X2 = X3 = X4 = t; the balanced equation is CH3COF + H2O CH3COOH -I- HF 
13. p(x)=x^-2x + 2 

15. p(x) = \^^x-^x' 



17. 



a. Using tat ^ = ^ as a parameter, ;? (7:) = 1 -I- ^7: -I- ( 1 — k)x where _ 00 <k< oo • 

b. The graphs for k = 0, 1 , 2, and 3 are shown. 




True/False 1 .8 

(a) True 

(b) False 

(c) True 

(d) False 

(e) False 

Exercise Set 1.9 



1. 



0.50 0.25] 

0.25 0.10 



$ 25, 290 
$ 22, 581 



0.1 0,6 0.4 
0.3 0.2 0.3 
0.4 0.1 0.2 

$31,500 
$ 26, 500 

$ 26, 300 

123.08 
202.56 

True/False 1.9 

(a) False 

(b) True 

(c) False 

(d) True 

(e) True 

Chapter 1 Supplementary Exercises 

1. 3x\ — X2 + :r4 = 1 

2x\ + 37:3 4- 3x4 = ~ ^ 



3 3 1 



2' 2^ 2' 



X2 = s, X4 = t 



3. 27:1 - 47:2 + X3 = 6 
-4x1 -\- 3x3 = -1 

X2 - X2 = 3 



5. 



7. :f = 4, y = 2, z=3 
9. a. « 5^ 0, 5t 2 

a 5£ 0, b = 2 
^a = 0,b = 2 

£2 = 0, 7t 2 



11 



• K-- 



13. 



x = 



X-- 



"-1 3 


-\ 




6 0 


1_ 




"1 -2 






3 1_ 






113 


160 


37 


37 


20 




46 


37 




37 


-2, c = 


= 3 





Exercise Set 2.1 



1. Mxx = 


29, Cii = 29 


Mn = 


21, Ci2= -21 


Ml3 = 


27, Ci3 = 27 


M2i = 


-11, C2i = ll 


M22 = 


13, C22=13 


M22 = 


-5, C23 = 5 


M3i = 


-19, C3i= -19 


^32 = 


-19, C32=19 


M33 = 


19, C33= 19 



a. Mi3 = 0, Ci3 = 0 

b. = - 96, C23 = 96 

c. ^22 = -48, C22= -48 



d. M2i=72, C2i= -72 



5. 



22; 



2^ 
11 



5_ 
22 
3_ 
22 



11 



7. 



59; 



59 
7 



59 



59 59 
9. fl2_5^_^21 
11. -65 
13. -123 
15. A= 1 or -3 
17. A = 1 or — 1 
19. (all parts) — 123 
21. -40 
23. 0 
25. -240 
27. -1 
29. 0 
31. 6 

33. The determinant is sin^^ + zos^O = 1- 
35. t3f2 = ^l+A 

True/False 2.1 

(a) False 

(b) False 

(c) True 

(d) True 

(e) True 

(f) False 

(g) False 

(h) False 

(i) True 

Exercise Set 2.2 



5. 


-5 


7. 


-1 


9. 


1 


11. 


5 


13. 


33 


15. 


6 



19. Exercises 14: 39; Exercise 15: 6; Exercise 16: — ^; Exercise 17: _2 

6 

21. -6 
23. 72 
25. -6 

27. 18 

True/False 2.2 

(a) True 

(b) True 

(c) False 

(d) False 

(e) True 

(f) True 

Exercise Set 2.3 

7. Invertible 
9. Invertible 



17. 



-2 



11. Not invertible 
13. Invertible 
15. 5±/i7 

'''^ 2 
17. -1 
19. 



21. 



^-1 = 



3 
-3 

2 

'l : 
2 : 



-5 
4 
-2 



23. 



^-1 = 



0 1 i 



0 0 

'-4 
2 
-7 

6 



3 
2 
1 

2 

3 

-1 

0 
0 



11 



11'-^ 11' 
27. ;,,-_30 ;,^__38 ,,__40 

29. Cramer's rule does not apply. 
31. y = 0 
35. 



37. 



a. 


-189 


b. 


1 




7 


c. 


8 




7 


d. 


1 




56 


e. 


7 


a. 


189 


b. 


1 




7 


c. 


8 




7 


d. 


1 




56 



True/False 2.3 

(a) False 

(b) False 

(c) True 

(d) False 

(e) True 

(f) True 

(g) True 

(h) True 

(i) True 
(j) True 
(k) True 
(1) False 

Chapter 2 Supplementary Exercises 
1. -18 

3. 24 
5. -10 

7. 329 

9. Exercise 3: 24; Exercise 4: 0; Exercise 5: —10; Exercise 6: —48 
11. The matrices in Exercise 1-3 are invertible, the matrix in Exercise 4 is not. 
13. -b^^5b-2\ 
15. -120 



17. 


1 


r 








6 


9 








\ 


2 








6 


9" 






1 o 


1 

8 


1 

8 




8 




1 


5 




1 




8 


24 




~24 




1 


7 




1 




4 


12 




~12 


21. 


1 


2 


1 




5 


5 


10 




1 


3 


2 




5 


5 


5 




2 


6 


3 




5 


5 


10 



10 
329 
55 
329 
_J_ 
47 
31 
"329 

25. .' = 1.^1. 



2 

"329 
11 
"329 

10 
47 

72 
329 



52 


27 


329 


329 


43 


16 


"329 


329 


25 


6 


47 


47 


102 


15 


329 


329 


4 


3 


5 





29. 



(b) 



2^2c 



, cos 7 = 



2ab 



Exercise Set 3.1 



a. PiP2=(-l,3) 

b. pI?2=(-3, 6,1) 

a. The terminal point is B(2, 3). 

b. The initial point is ^(—2, —2, —1). 

a. u=( — 1,2, — 4) is one possible answer. 

b. a = (7, — 2, — 6) is one possible answer. 

a. u + w=(l, -4) 
^ v-3u=(-12, 8) 

c. 2(u-5w) = (38,28) 

d. 3v-2(u+2w) = (4, 29) 

^ -3(w-2u I v) = (33, -12) 
f. (-2u-v)-5(v+3w) = (37, 17) 

a. (-1,9, -11,1) 

b. (22,53, - 19, 14) 

^ (_13, 13, _36, -2) 
^ (-90, - 114,60, -36) 

e. (-9, -5, -5, -3) 

f. (27, 29, - 27, 9) 

a. w-u=(-9,3, -3, -8,5) 

b. 2v-H3u=(13, -5, 14, 13, -9) 



^ _w+3(v-u) = (=14, =2,24,2,7) 

5(-v4-4u-w) = (125, -25, -20,75, -70) 
^ -2(3w + v) + (2u + w) = (32, - 10, 1, 27, - 16) 

f- ^(w-5v I 2u)+v=(|, |, -12, -|, -2^ 

19. a. v-w=(-2, 1, -4, -2,7) 

b. 6u+2v=(-10, 6, -4,26,28) 

c. (2u-7w)-(8v + u) = (-77,8, 94, -25,23) 

21. ^=1^3. 18 2m 

^ 3' 2' 3' 3' 6 J 

a. Not parallel 

b. Parallel 

c. Parallel 
25. a = 3, i = - 1 

27. c\ = 2, C2 = =1, C2 = 5 
29. t^l = ^» ^2 = 1, = — 1, C4= 1 
33. , /9 1 



a. (9 _1 _ n 

2' 2) 

b. 23 _9 n 

U ' 4' 4j 



True/False 3.1 

(a) False 

(b) False 

(c) False 

(d) True 

(e) True 

(f) False 

(g) False 

(h) True 

(i) False 
(j) True 
(k) False 

Exercise Set 3.2 

'■ •"==■#=(!•-!)■ -#=(4 1) 

3. a. ||u + v|| = /83 

b. W + ||v|| = /l7 + /26 

c. ||-2u+2v|| = 2/3 

d. ||-3u-.5v+w|| = /466 



a. ||3u-5v + w|| = ^2570 

b. ||3u|| - 5||v|| + ||w|| = 3/46 - IO/2T+ /42 

c. II -||u|M| = 2/966 

7' ^ 7 
9. a. u • V = - 8, u ■ u = 26, V • V = 24 
b. u-v = 0, u-u = 54, vv = 21 

11- a. ||u-v|| = /i4 

b. ||u-v|| = /59 

c. ||u-v|| = /677 



13. a a 15 
cos^= -- 



C. COS 9= ^ 



f7 r77 ; 0 is obtuse 
136 

^^;e is obtuse 

^^•a.b = 45il 



2 

l"^- a. u • (v • w) does not make sense because v • w is a scalar. 

b. u • (v + w) makes sense. 

c. ||u • v|| does not make sense because the quantity inside the norm is a scalar. 

d. (u ■ v) — ||u|| makes sense since the terms are both scalars. 



19. 



•■(4-?) 

1 7 
5/2' 5/2 

4' 2' 4 



1 2 3 4 5 
/55' /55' /55' /55' /55 



23. a a H 

^- cos 0— 



{962 

COS0= --p= 

c. costf=0 



25. 



a. |u-v|=10, ||u||||v|| = y^l3vfl7w 14.866 

b. |u-v| = 7, ||u||||v|| = /lo/l4« 11.832 
e. |u-v| = 5, ||u||||v|| = (3)(2) = 6 

27. A sphere of radius 1 centered at (^q, ■^o)- 

True/False 3.2 

(a) True 

(b) True 

(c) False 

(d) True 

(e) True 

(f) False 

(g) False 

(h) False 

(i) True 
(j) True 

Exercise Set 3.3 

a. Orthogonal 

b. Not orthogonal 

c. Not orthogonal 

d. Not orthogonal 

^' a. Not an orthogonal set 

b. Orthogonal set 

c. Orthogonal set 

d. Not an orthogonal set 

7. Yes 



9. -2(;^ + l)=HO; = 3) = (z+2)=0 

11. 2z = 0 

13. Not parallel 

15. Parallel 

17. Not perpendicular 

19- a. 2 

5 

b. J8_ 

{22 

21. (0. 0) (6. 2) 



23. 


(-t 


,0, ■ 


25. 


(..§, 


_r 


27. 


fr- 


1 

5' 1 


29. 


1 




31. 


1 




33. 


5 
3 




35. 


1 

{29 




37. 


11 





i 6 _9_ 21^ 
5' 5' 10' 10 J 



39. 0 (The planes coincide.) 

(b) cos ^ = cos 7=^ 

True/False 3.3 

(a) True 

(b) True 

(c) True 

(d) True 

(e) True 

(f) False 

(g) False 

Exercise Set 3.4 

1. Vector equation: (x^y) = (-4, 1) I t(0, -8); 

parametric equations: x — — 4, y = \ — St 
3. Vector equation: (x,y,z) = t(- 3,0, \)', 

parametric equations: x = —3t, y = 0, z = t 
5. Point: (3, - 6); parallel vector: (-5, - 1) 
7. Point: (4, 6); parallel vector: (—6, — 6) 

9. Vectorequation:(j,y,2) = (_3, 1,0) \ ti(0, -3,6) \ t2i-5, \,2), 

parametric equations: x^ - 3- 5t2, y = \ - 3t\^ t2, z=6t\\2t2 
11. Vector equation: (x,;i;,z) = (-l, 1,4) Ui(6, -1,0) \ i2{-\,3,\y, 

parametric equations: x ^ - \ \ 6t\- 12, y = \ - i\ ^ 3t2, z = 4-\-t2 
13. A possible answer is vector equation: (^x, y) =t(3, 2); 

parametric equations: x = 3t, y = 2t 
15. A possible answer is vector equation: (^x, y,z) = ti(0, \, 0) { ^2(5^ '^)' 

parametric equations: x I 5t2, y = t^z = 4t2 
17. ^1= ^s — t, X2 = s, X2 = t 



21- a. (1, 0, 0) +s( - 1, 1, 0) I ^( - 1, 0, 1) 

b. a plane in passing through P(l,0, 0) and parallel to ( — 1, 1,0) and ( — 1,0, 1) 

23. ^ X -\- y \ z = 0 
-2x 3y =0 

b. a line through the origin in p^^ 

^' x= - ^t, y= - ^t, z = t 

25. a 2.1, 

a- x\ = ——s~\--t, X2 = S, X3 = t 

c. xi = \-^s } y^, X2 = s, X3 = 1 + ^ 

27. ;if J = 1 _ _ 1^^ X2 = s, X2 = t, ^4 = 1; The general solution of the associated homogeneous system is ;ti = — — X2 = s, 7:3 = t, X4 = 0. A 
particular solution of the given system isxi = -^, ^2 = ^^ ^3 = 0, ;^4=1. 

True/False 3.4 

(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

Exercise Set 3.5 
1. a. (32, -6,-4) 

(-14, _20, - 82) 
^ (27,40, -42) 
3. (18,36, -18) 
5. ("3,9, -3) 
7. {59 

9. fm 

11. 3 
13. 7 

15. ^/374 

2 

17. 16 

19. The vectors do not lie in the same plane. 

21. -92 

23. <a!Z)c 

25. a. -3 

b. 3 

c. 3 

a. 

2 

b. ^26_ 

3 

29. 2(vxu) 

37. a. iZ 
6 

b. 1 

2 

True/False 3.5 

(a) True 

(b) True 

(c) False 

(d) True 

(e) False 

(f) False 



Chapter 3 Supplementary Exercises 

1. a. 3v-2u=(13, -3, 10) 

b. ||u + v + w|| = /70 

c. {m 

d. proj„u=-^(2, -5, -5) 

e. u • (vxw) = — 122 

f. (-5v + w)x((u-v)w) = (-3150, -2430,1170) 

3. a. 3v-2u=(-5, -12,20, -2) 

b. ||u + v + w|| = /T06 

c. /2810 

d. proj^u=-^(9,l, -6, -6) 

5. Not an orthogonal set 

^* a. A line through the origin, perpendicular to the given vector. 

b. A plane through the origin, perpendicular to the given vector. 

c. {0} (the origin) 

d. A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 
9. True 

11. S'(-l, -1,5) 

13. /M 
V 17 
15. Jl- 

17. Vectorequation: (x,;i;,z) = ( -2, 1,3)4^1 (1, -2, -2) + ^2(5, -1, -5); 

parametric equations: x= ^2-\-ti-\- 5t2, y = \ - 2t\-t2, z = l> -2i\-'^t2 
19. Vector equation: {x,y) = (0, - 3) + ^(8, - 1); 

parametric equations: 7i = %t, y = —3 — t 
21. A possible answer is vector equation: [x, 7) = (0, — 5) -I- ^ (1, 3); parametric equations: x=t, y= —5 + 3t 
23. 3(;^-f- l)-f 6O-5) + 2(z-6) = 0 
25. -\S(x-9)-5\y-24(z-4) = 0 
29. A plane 

Exercise Set 4.1 

1. (a) u + v=(2,6),3u=(0,6) 

(c) Axioms 1-5 
3. The set is a vector space with the given operations. 
5. Not a vector space. Axioms 5 and 6 fail. 
7. Not a vector space. Axiom 8 fails. 
9. The set is a vector space with the given operations. 
11. The set is a vector space with the given operations. 

True/False 4.1 

(a) False 

(b) False 

(c) True 

(d) False 

(e) False 

Exercise Set 4.2 

1. (a),(c),(e) 
3. (a), (b), (d) 
5. (a), (c), (d) 
7. (a), (b), (d) 
9. (a), (b), (c) 



a. The vectors span 

b. The vectors do not span 

c. The vectors do not span 

d. The vectors span 

13. The polynomials do not span 

a. Line; x = ^ ^t, y = - ^t, z = t 

b. Line;x = 2^, y = t, z=0 

c. Origin 

d. Origin 

e. Line; x= -3t, y= -2t, z = t 

f. Plane;3:_37H-z = 0 

True/False 4.2 

(a) True 

(b) True 

(c) False 

(d) False 

(e) False 

(f) True 

(g) True 

(h) False 

(i) False 
(j) True 
(k) False 

Exercise Set 4.3 

^' a. U2 is a scalar multiple of . 

b. The vectors are linearly dependent by Theorem 4.3.3. 

c. P2 is a scalar multiple of P i . 

d. 5 is a scalar multiple of ^. 

3. None 

^' a. They do not lie in a plane, 
b. They do lie in a plane. 

7 2 3 7 3 7 2 

(b) VI = -V2 - yV3, V2 = -^vi + ^V3, V3 = - -Yi + -Y2 

a. They are linearly independent since vi , V2, and V3 do not lie in the same plane when they are placed with their initial points at the origin. 

b. They are not linearly independent since vi, V2, and V3 line in the same plane when they are placed with their initial points at the origin. 
21. W{x) = — 7: sin ;t — cos x ?t 0 for some x. 

b. W(x) = 2^0 
25. W(x) = 2 sin x ?t 0 for some x. 

True/False 4.3 

(a) False 

(b) True 

(c) False 

(d) True 

(e) True 

(f) False 

(g) True 

(h) False 

Exercise Set 4.4 



^' a. A basis for p^^ has two linearly independent vectors. 

b. A basis for p^ has three linearly independent vectors. 

c. A basis for P2 has three linearly independent vectors. 

d. A basis for M22 has four linearly independent vectors. 
3. (a), (b) 

7- a. (w)^=(3. -7) 



b-a 



2 

9. a. (v)^=(3, -2,1) 

b. (v)^=(-2,0,l) 

11. -1,3) 
13. A = Ai-A2 + A2-A^ 
15. p = 7pi -8p2 + 3p3 

17. a. (2, 0) 

^' [j^ _ J- 

c. (0, 1) 

1/3 /3 

True/False 4.4 

(a) False 

(b) False 

(c) True 

(d) True 

(e) False 

Exercise Set 4.5 

1. Basis: (1,0, 1); dimension = 1 

3. Basis: (4, 1,0,0), (-3,0,1,0), (1, 0, 0, 1); dimension = 3 
5. No basis; dimension = 0 
7. 



■.(|,,.o).(-|.o,,) 



b. (1, 1,0), (0,0, 1) 

c. (2, -1,4) 

d. (1, 1,0), (0, 1, 1) 
9. a. n 

b. n{n+\) 

2 

c. n{n-\- 1) 

2 

13. Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 
15. V3 = (a, b, c) with 9a -3b -5c ^0 
True/False 4.5 

(a) True 

(b) True 

(c) False 

(d) True 

(e) True 

(f) True 

(g) True 

(h) True 

(i) True 
(j) False 

Exercise Set 4.6 



^5- = 



' 3 
_-7_ 

"J_" 
28 
3_ 
14 



a 



11. 



13. 



(p)^=(4, -3,1), [p]^ = 



(p)^=(0,2, -1), [p]^ = 



4 
-3 
1 

0" 
2 
-1 



5. a. «r=(16, 10, 12) 
b. q=3 4 4;t^ 



5 = 



11 
10 

_2 
5 

0 
-2 



15 -1 

6 3 

2 



_5 

2 

11 
■ 2 



[w]b = 



"10 



[W]£' = 



-4 

-7 



3 



2 ^ 



(b) 
(c) 

(d) 
(a) 

(b) 

(d) 

(e) 



-2 -3 
5 1 

[w]5 = 

2 0] 
1 3j 

_1 1 
6 3 



_7 
2 
23 
2 

6 



[h]F' = 



1 2 3 

2 5 3 
1 0 8 
-40 16 9 

13 -5 -3 

5 -2 -1 
-239" 



[w]5 = 



77 
30 



5 
3 
1_ 

-200" 
64 
25 



15. 



17. 



(a) 


3 5 
_-l -2 




(b) 


2 5 
_-l -3 




(d) 


[w] 5i = 


2 


(e) 


r,,,i „ 

Lwj ^2 — 


3" 


a. 


3 2 


J 
2 




-2 -3 


1 

2 




5 1 


6 


b. 







[w] B2 = 
[w] 5i = 



-1 
1 

4 
-1 



9 
-9 
-5 



[w]b2 = 



_7 
2 
23 
2 

6 



19. 



23. 



fcos 2B sin 20 
sin 20 —cos 20 



a. 5= ((1,1,0), (1,0, 2),(0, 2,1)} 

''•^={(f f -f)-fr-H} (-f-f-^)} 

True/False 4.6 

(a) True 

(b) True 

(c) True 

(d) True 

(e) False 

(f) False 

Exercise Set 4.7 

1. ri = (2, -1,0,1), r2=(3, 5,7, - 1), rs = (1, 4, 2, 7) ; 





2 




-1 




0 




1 


ci = 


3 


, C2 = 


5 


, C3 = 


7 


. C4 = 


-1 




1 




4 




2 




7 







3" 


_4_ 




_-6_ 



b. b is not in the column space of A. 



-3 



= -26 



+ (^-1) 



5 
1 

-1 
-1 
1 

-1 



I t 



-f 13 



-7 



+ 4 



b. 


-2 






1 




-1 
















7 






1 


; t 


-1 
















0 




1 




1 














c. 


-1 




2 






-1 




-2 




2 




-1 




-2 




0 




1 






0 




0 




1 




0 




0 




0 


-\-r 


0 


A 


-s 


1 


0 


; 


0 


+ s 


1 


( t 


0 














0 




0 






0 




1 




0 




0 




1 



11. 



15. 



17. 



'6" 




'l' 




1 " 








1 " 


5 




5 




5 




5 




5 


7 




4 




3 




4 




3 


5 


+ s 


5 




5 




5 




5 


0 




1 




0 




1 




0 


0 




0 




1 




0 




1 



7. a. 


'r 




'2' 


ri=[102], r2=[001], ci = 


0 


. C2 = 


1 




0 




0 



b. 

ri = [1 -3 0 0], r2= [0 1 0 0], ci = 
c. ri= [1 2 4 5], r2= [0 1 -3 0], r3= [0 0 1 - 3], r4= [0 0 0 1] , 



'\ 




'-3" 


0 




1 


0 


. C2 = 


0 


0 




0 





1 




2 




4 




5 




0 




1 




-3 




0 


CI = 


0 


, C2 = 


0 




1 




-3 




0 




0 




0 




1 




0 




0 




0 




0 


ri = 


[12 -15], r2=[0143], rs = 


[0 0 1 




1 




2 




-1 




5 




0 




1 




4 




3 


ci = 


0 


, C2 = 


0 




1 




-7 




0 




0 




0 




1 



9. a. 


'\' 




'2" 


ri = [1 0 2], r2= [0 0 1]; ci = 


0 


; C2 = 


1 




0 




0 



b. 

ri = [1 -3 0 0]; r2 = [0 1 0 0], ci = 
c. ri = [l 2 4 5]; r2=[0 1 -3 0]; rs = 



r4 = [0 0 0 1]; ci = 





'-3' 




1 


; C2 = 


0 




0 


0 1 ■ 


-3]; 



r4= [0 0 0 1]; ci = 



(1, 1, -4 -3), (0, 1, 



1 




2 




4 




5 


0 




1 




-3 




0 


0 


; C2 = 


0 




1 


, C4 = 


-3 


0 




0 




0 




1 


0 




0 




0 




0 


= [014 


3]; r3 = 


:o 0 


1 -7]; 


1 




2 




-1 




5 


0 




1 




4 




3 


0 


; C2 = 


0 


; C3 = 


1 


; C4 = 


-7 


0 




0 




0 




1 



5, -2), (0,0,1, -1) 

r 



(1, -1,2,0), (0, 1,0,0), ^0,0, 1, --^ 

c. (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 

(b) 0 0 0 
0 1 0 
0 0 1 



3a -5a 
3b -5b 



for all real numbers a, b not both 0. 

b. Since A and B are invertible, their null spaces are the origin. The null space of C is the line 3x -\- y = 0- The null space of D is the entire xy-plane. 

True/False 4.7 

(a) True 

(b) False 

(c) False 



(d) False 

(e) False 

(f) True 

(g) True 

(h) False 

(i) True 
(j) False 

Exercise Set 4.8 

1. Rank(^)=Rank(^^)=2 



3. 



a. 2; 1 

b. 1;2 

c. 2; 2 

d. 2; 3 

e. 3; 2 

a. Rank = 4, nullity = 0 
Rank = 3, nullity = 2 
c. Rank = 3, nullity = 0 



7. 



a. Yes, 0 

b. No 

c. Yes, 2 

d. Yes, 7 

e. No 

f. Yes, 4 

g. Yes, 0 

9. ^1=^. ^2 = ^' ^3 = 4^ — 3r, b4 = 2r — s, b^ = Ss — lr 
11. No 

13. Rank is 2 if ^ = 2 and s= 1 ; the rank is never 1 . 
17. a. 3 

b. 5 

c. 3 

d. 3 



2 

2 4 



True/False 4.8 

(a) False 

(b) True 

(c) False 

(d) False 

(e) True 

(f) False 

(g) False 

(h) False 

(i) True 
(j) False 

Exercise Set 4.9 

a. Domain: /J^- codomain: 

b. Domain: codomain: 

c. Domain: codomain: /^^ 

d. Domain: codomain: 

3. R^, (-1,2,3) 

^' a. Linear; ^ 

b. Nonlinear; ~, 



c. Linear; , 

d. Nonlinear; ^ 

7. (a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 



9. 



11. 



13. 



15. 



17. 



19. 



21. 



25. 



29. 



3 5 -1 

4 -1 1 
3 2-1 



7(-l,2,4) = (3, -2, -3) 



0 1 
-1 0 

1 3 
1 -1 

7 2-11 

0 1 10 
-10 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 1 

10 0 0 

0 0 10 

0 1 0 0 
10-10 

a. re -1,4) = (5, 4) 

b. T(2,l, -3) = (0, . 

a. (2, -5,-3) 

b. (2, 5, 3) 

c. (-2, -5,3) 

a. (-2,1,0) 

b. (-2,0,3) 

c. (0, 1, 3) 



•2, 0) 



_ /3-2 1+2/3 



b. (0, 1, 2/2) 

c. (-1. -2,2) 



1 



_2 ^"^^ -1 + 2/3 



b. (-2/2, 1,0) 

c. (1,2,2) 



1 8 

"9 9 

8 _1 

9 9 
4 4 
9 9 



a. Twice the orthogonal projection on the x-axis. 

b. Twice the reflection about the x-axis. 



31. Rotation through the angle 2B- 

33. Rotation through the angle G and translation by xq; not a matrix transformation since xq is nonzero. 

35. A line in 

True/False 4.9 

(a) False 

(b) False 

(c) False 

(d) True 



(e) False 

(f) True 

(g) False 

(h) False 

(i) True 

Exercise Set 4.10 



1. 


' 5 


-1 


21" 








"-8 


-3 1 




10 


-8 


4 


, TaoTb = 


-5 - 


■15 -8 




45 


3 


25 








44 - 


■11 45 


3. a 


"1 
1 - 


r 
-i_ 


, T2 




'3 0" 
2 4_ 








^' T2oTi = 


'3 
6 


3 

-2_ 


- ^1 o ^2 




'5 4" 
1 -4_ 





11. 



13. 



15. 



c. T2{T\{xi, X2)) = {3?: \ +3x2,^7:1-2x2), 
Ti{T2{xu X2)) = (5x1 + 47:2, -47:2) 

1 0 
0 -1 

0 0" 



-10 0 

0 0 0 

0 0 1 

1 0 1 
0/20 

-1 0 1 

-1 0 0" 

0 1 0 

0 0 0 

7i o 72 = T2 o T\ 
7i o 72 = ^2 o T\ 
7i o 72 5t 72 o T\ 

Not one-to-one 

One-to-one 

One-to-one 

One-to-one 

One-to-one 

One-to-one 

One-to-one 



One-to-one; 



7 H^i, ^2) = \ h^\ ■ 



Not one-to-one 
One-to-one; 



0 -1 

0 



_-l 

Not one-to-one 

Reflection about the x-axis 
Rotation through the angle — ^ 

Contraction by a factor of y 

Reflection about the j^z-plane 
Dilation by a factor of 5 



7 ^(wi, vi?2) = (-W2, -wi) 



17. 



19. 



21. 



23. 



25. 



27. 



29. 



a. Matrix operator 

b. Not a matrix operator 

c. Matrix operator 

d. Not a matrix operator 

a. Matrix transformation 

b. Matrix transformation 

-1 0 
0 0 
0 1 
-1 0 
0 0] 
3 oj 

a. :r^(ei) = ( - 1, 2, 4), Ta{^2) = (3, 1, 5), Ta{^2) = (0, 2, - 3) 

b. r^iCei +62 + 63) = (2, 5, 6) 

c. 7-^(763) = (0, 14, -21) 

a. Yes 

b. Yes 



(b) T(xi, X2)=[xl\ 4 ^1^2) 



a. The range of 7 is a proper subset of/?". 

b. T must map infinitely many vectors to 0. 

True/False 4.10 

(a) False 

(b) True 

(c) True 

(d) False 

(e) False 

(f) False 

Exercise Set 4.11 



1. 



0 -1 
-1 0 
-1 0 

0 -1 
1 0 
0 0 
0 0 
0 1 



1 0 




0 


0 1 




0 


0 0 




-1 


1 


0 


0 


0 - 


•1 


0 


0 


0 


1 


-1 


0 


0 


0 


1 


0 


0 


0 


1 


0 - 


•1 


0 


1 


0 


0 


0 


0 


1 


1 0 




0 


0 0 




-1 


0 1 




0 


0 


0 


1 


0 


1 


0 


-1 


0 


0 



7. Rectangle with vertices at (0, 0), (-3, 0), (0, 1), (-3, 1) 



11. 



13. 



17. 



19. 
23. 



1 0 
4 1 
1 -2 
0 1 



Expansion by a factor of 3 in the x-direction 

Expansion by a factor of 5 in the j^-direction and reflection about the x-axis 
Shearing by a factor of 4 in the x-direction 

0 5 

0 -1 
-1 0_ 

y=—2x 



y= - 



11 



(b) No 



1 0 k 
0 1 k 
0 0 1 

b. Shear in the xz-direction with 



factor k maps (x, y, z) to f ky, y, z + ky) '• 



1 k 0 
0 1 0 
0 k 1 



Shear in the j^z-direction with factor k maps (x, j^, z) to (x, y V kx,z ^^ kx)'- 



1 0 0 
k 1 0 
k 0 1 



True/False 4.11 

(a) False 

(b) True 

(c) True 

(d) True 

(e) False 

(f) False 

(g) True 

Exercise Set 4.12 



1. 



a. Stochastic 

b. Not stochastic 

c. Stochastic 

d. Not stochastic 

0.54545 
0.45455 

a. Regular 

b. Not regular 

c. Regular 



17 
_9_ 
17 



11. 



13. 



4_ 
11 
A_ 
11 
J_ 
11 



Probability that something in state 1 stays in state 1 
Probability that something in state 2 moves to state 1 
0.8 
0.85 

0.95 0.55 
0.05 0.45 

0.93 
0.142 
0.63 



15. 



Year 


1 


2 


3 


4 


5 


City 


95,750 


91,840 


88,243 


84,933 


81,889 


Suburbs 


29,250 


33,160 


36,757 


40,067 


43,111 



City 


46,875 


Suburbs 


78,125 



17. 



23 
100 
46 



159 
22 
53 
47 



159 

c. 35, 50, 35 



19. 





7 


1 


1 ' 




'i 




10 


10 


5 




3 




1 


3 


1 




1 




5 


10 


2 


; q- 


3 




1 


3 


3 




1 




10 


5 


10 




3 



21. p'^q = q for every positive integer k 

True/False 4.12 

(a) True 

(b) True 

(c) True 

(d) False 

(e) True 

Chapter 4 Supplementary Exercises 

1. (a) u + v=(4,3,2), -u=(-3,0, 0) 
(c) Axioms 1-5 

S.lfs^l, — 2, the solution space is the origin. lfs= 1, the solution space is a plane through the origin. lfs = 
7. A must be invertible 
9. a. Rank = 2, nulHty= 1 
Rank = 2, nullity = 2 
c. Rank = 2, nullity = « — 2 



— 2, the solution space is a line through the origin. 



11. 



a- |l, x^, t:''^, :^^^| where 2/« =« if « is even and 2/« =« — 1 if « is odd. 
b. jx, x\ 



13. 



1 0 0 

0 0 0 

0 0 0 





0 


1 


0 




0 


0 


1 




0 


0 


0 




0 


0 


0 




0 


0 


0 




1 


0 


0 




0 


0 


0 




0 


1 


0 




0 


0 


1 




0 


0 


0 




0 


0 


0 




1 


0 


0 




0 


0 


0 




0 


1 


0 




0 


0 


1 



0 1 0 
-10 0 

0 0 0 



0 0 1 
0 0 0 
-10 0 



0 0 
0 1 
-1 0 



15. Possible ranks are 2, 1, and 0. 
Exercise Set 5.1 
1. 5 

^' a. A^-2A-3 = 0 

b. A^-8A+16 = 0 

c. a2-12 = 0 

d. A^ + 3 = 0 

e. A^ = 0 

f. a2-2A+1 = 0 

5. a. 

Basis for eigenspace corresponding to A = 3 : 

b. 

Basis for eigenspace corresponding to A = 4 : 



11. 



basis for eigenspace corresponding to A = — I : 



Basis for eigenspace corresponding to A = ^1^: 



d. There are no eigenspaces. 

^* Basis for eigenspace corresponding to A = 0 : 

Basis for eigenspace corresponding to A = 1 : 

a. 1,2,3 

b. -/2, 0, /2 

c. -8 

d. 2 

e. 2 

f. -4, 3 

a. a'^ + A^-3A^-A + 2 = 0 

b. a'^-8A^ + 19a2-24A + 48 = 0 



^2 
1 



; basis for eigenspace corresponding to A = — ^f\2. 



3 

1 







'0' 


_0_ 




_1_ 


"1" 




o" 


_0_ 




_1_ 



A= 1: basis 



A = 4:basis 





o" 




0 




0 




1 



13. 



15. 



\2) "512' 
2i. y = x and y 

b. No lines 

c. y = 0 

True/False 5.1 

(a) False 

(b) False 

(c) True 



2^ = 512 
2x 



A = — 2: basis 



-1 

0 
1 

0 



■ 1 : basis 



(d) False 

(e) True 

(f) False 

(g) False 

Exercise Set 5.2 

1. Possible reason: Determinants are different. 

3. Possible reason: Ranks are different. 

5. A = 0:1 or 2, A=l:l, A = 2:l,2, or 3 

7. Not diagonalizable 

9. Not diagonalizable 

11. Not diagonalizable 
13. 



P = 



15. 



P = 



17. 



P = 



19. 



21. 



23. 



25. 



1 1 

-2 0 1 

0 1 0 

1 0 0 

1 2 1 
1 3 3 
1 3 4 

1 0 0 
0 1 0 
-3 0 1 

10 0 0 
0 11-1 

0 0 1 0 

0 0 0 1 

-1 10237 -2047 
0 1 0 
0 10245 -2048 



1 



0 



0 -1 



P-'AP-- 



3 0 0 

0 3 0 

0 0 2 

1 0 0 
0 2 0 
0 0 3 

0 0 0 

0 0 0 

0 0 1 



P~'AP-- 



2 


0 


0 


0 


0 


-2 


0 


0 


0 


0 


3 


0 


0 


0 


0 


3 



A'^^PD^P-^-. 



1 1 

2 0 
1 -1 



r 0 

0 3" 



0 
0 

0 4" 



27. 



On possibility is P - 



where \\ and A2 are as in Exercise 20 of Section 5.1. 



(3— Al (3— A2 

33. ^ A = 1 : dimension = 1 ; A = 3 : dimension < 2, A = 4 : dimension < 3 

b. Dimensions will be exactly 1, 2, and 3. 

c. A = 4 

True/False 5.2 

(a) True 

(b) True 

(c) True 

(d) False 

(e) True 

(f) True 

(g) True 

(h) True 

Exercise Set 5.3 

1. u=(2 4-i, -4z, Re (u) = (2, 0, 1), Im(u) = (-1,4, 1), ||u|| = /23 

5^ x=(7-6j, -4-8j, 6-12z) 



7. 



5j 4 
2 + j l-5j 



Re {A) : 



0 4 
2 1 



Im(^): 



-5 0 
-1 5 



det(^) = 17-z, tr(^) = l 



11. u-v= — 1+i, u-w=18 — 7j, vw=12 + 6j 
13. -11-14Z 



15. . o 

Ai=4-j, XI = 
19. |A|=/^, 
21. |A| = 2, ^=-| 



2-j 
1 

1 



A2 = 2 + i, XI = 
; A2 = 4 + i, XI = 



2 + i 
1 

1 



23. 



25. 



-2 -1 

2 0 

1 -1 

-1 0 



C-- 
C-- 



3 -2 

2 3 
5 -3 

3 5 



27. 



b. None 

True/False 5.3 

(a) False 

(b) True 

(c) False 

(d) True 

(e) False 

(f) False 

Exercise Set 5.4 



1. 



• yi =cie^^ - 2c2e 
y2 = c\e^^ -f cje' 



3- a- yi= -C2e^' -^ cse^' 

y2 = cie' \ 2c2e^'-C3e^' 

y2 = 2c2e^' -C2e^' 
b. yi=e^'-2e^' 

y2 = e'-2e^'^2e^' 

y3= - 2e^' + 26^"^ 
7. y = {:ie^' + C2e"^' 
9. y = {:ie' + C2^^^ +«^3^^' 
True/False 5.4 

(a) False 

(b) False 

(c) True 

(d) True 

(e) False 

Chapter 5 Supplementary Exercises 

^' (b) The transformation rotates vectors through the angle (j; therefore, if Q 0 then no nonzero vector is transformed into a vector in the same or opposite 
direction. 



(c) 



1 1 0 
0 2 1 
0 0 3 



'15 


30" 




'75 


150" 




'375 


750" 




_ 5 


10 




25 


50 _ 




125 


250_ 





1875 3750 
625 1250 



11. 0, tr(^) 

13. They are all 0. 



15. 



1 



0 0 



-1 -1 -4 



17. They are alio, 1, or _1. 
Exercise Set 6.1 

1- a. 5 

b. -6 

c. -3 

d. /l3 

e. {I 

f . /89 

3- a. 2 

b. 11 

c. -13 

d. -8 

e. 0 



5. 



9. 
11. 



13. 



15. 



a. -5 

b. 1 

c. -7 

d. 1 

e. 1 

f . 1 

a. 3 

b. 56 

(b) 29 

a. r/3 0 

0 /? 

b. p 0 

0 {I 

a. /74 

b. 0 



a. ^105 

b. yf47 
17. (p, q} = 50, 
19. a. 3/^ 

b. 3/5 

c. 3/T3 
21. , 



1 = 6/3 



27. 



Forr = 



0 1 
-1 0 



then (V, f^J = — 2 < 0, so Axiom 4 fails. 



29. a _28 
15 

b. 0 

True/False 6.1 

(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

(g) False 

Exercise Set 6.2 

1- a. .JL 

f2 

b. 

c. 0 

d 20_ 

9/To 

e. _ J_ 

f. ^ 

3. a. 19 

10/7 

b. 0 

7. No 

9. a. ^=-3 

b. ^= -2, -3 

13. No 

15. a. X = ^, 7 = - 2^, z = - 3^ 

b. 2;r - 57 -I- 4z = 0 

c. X —z=0 

•^l* a. The line y = —x 

b. Thexz-plane 

c. The X-axis 

True/False 6.2 

(a) False 

(b) True 

(c) True 

(d) True 

(e) False 

(f) False 

Exercise Set 6.3 

1. (a), (b), (d) 
3. (b), (d) 
5. (a) 



7. 




11. 



13. 



15. 



17. 



19. 



21. 



•V2 I 2v3 



a. -2vi+l 

b. _^vi-|-V2 + 4v3 

3 15 
C. --V1--V2 I yV3 

(b) u = - ^vi - jiv2 + 0v3 + ^V4 



3 



-ui - 



3 



U2 - ^U3 



11 



w= ^U2 I -p^U3 



u /2 1 _1 _1] 
\4' A' 4' 4j 

, m 7 _ J 23 \ 

U2'4' 12' I2J 

I. f 23 11 _ J 17 

\\S' 6 ' 18' 18 

. (1 1 _1 _1] 

\2'2' 2' 2) 



VI : 



n 5 . 


3 




f 3 3 3 3 


U' 4' 


4' 


-^}w2 


[ 4'4'4' 4 



_J 3_ 

y/To' /To 



V2 = 



_J ]_ 

/To' /To 



b. VI = (1.0). V2 = (0. -1) 



}l5 fj5 j [}J30 ]l30 \J30 



V3 = 



25. 



29. 



/To' /To' /To' /To 



V4 = 



1 1 



/T?' /T?' /T?' /T? 



VI = 



J I L V2= 1'^^ - -4-1 V3= P- 0 

{l^ {I' {Ij ^ \/6' /6' ijej ^ /6' 



27. ^ fli 11 40^ M _ J_ ^ 

^14' 14' 14 }' 2 ^1^. 1^ 



1 2_ 

J_ J_ 
{I f5 



0 /5 



1 

f2 



1 

f2 



L 

1 
1 

/I 

8 

/234 

11 
/234 

7 

^234 

L 

1 
1 



1 

1 

3 

_2 

3 

2 
3 

1 

0 



"/2 3/^' 
0 /3 



1 

3 

/26 

3 



J_ 

fe 



'f2 f2 
0 --L 



0 0 



1 




3_ 




2/19 


^19 


1 


_ 


3 


f2 


2/19 




0 


3/2 


1 


v'T9 


/T9 



/5 



3/2 

/2 ^19 
^19 



f. Columns not linearly independent 
33. VI = 1, V2 = {3{2x - 1), V3 = /5(6x^ - 6;f I 1) 

True/False 6.3 

(a) False 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

Exercise Set 6.4 



1. 







'20" 


[^2 




_20_ 



21 25 

25 35 
15 -1 5 

-1 22 30 
5 30 45 



a. :.i = 5, X2 = ^ 













9 






13 



b. x\ = 12, X2 = -3, 7:3 = 9 



3 
2 
9 
2 
-3 

3 
=3 
0 
3 



a* Solution: x 
^- Solution: x 



= 5" ^^^^^ squares error: ^/5 

= |y,oj t^(— 3, l)(^a real number); least squares error: ^^42 



^' Solution: x= |— oj I — 1, — 1, 1) a real number); least squares error: -^^294 



a. (7, 2, 9, 5) 



11. 



13. 



( 

\ 5 ' 5' 5 ' 5 

a. det (-(4 j4) = 0; A does not have linearly independent column vectors. 

b. det (A'^A) = 0, a does not have linearly independent column vectors. 



[P] 



[P] = 



10 0 

0 0 0 

0 0 1 

0 0 0 

0 1 0 

0 0 1 



15. a. (1,0, -5), (0, 1,3) 



10 15 -5 
15 26 3 
-5 3 34 



27:0 + 3^0 -ZQ 15x0 < 2^70 t 3zo -5:^0 I 3^0 I 34zo 



35 



35 



d. 3/35 
7 

17. s = t = \ 

[P] =a'^(aa'^)~^a 

True/False 6.4 

(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

(g) False 

(h) True 

Exercise Set 6.5 

y = 2 + 5x-3x^ 



5_ 

' 21 



48 




True/False 6.5 

(a) False 

(b) True 

(c) False 

(d) True 

Exercise Set 6.6 

1- a. (1 +ir) — 2 sinx — sin 2x 
sin 2x 



b. (l+tr)-2|^sinx-|-- 



, sin 3x , 



in«j: 1 
« J 



IS 

b. 1 _ J. 

True/False 6.6 

(a) False 

(b) True 

(c) True 

(d) False 

(e) True 

Chapter 6 Supplementary Exercises 
1. 



a. (0, a, a, 0) with a^O 

a. The subspace of all matrices in M22 with only zeros on the diagonal. 

b. The subspace of all skew- symmetric matrices in M22- 



(7?-»-7j) 



9. No 

(b) ^ approaches 

17. No 

Exercise Set 7.1 



1. 



(b) 



_9_ 
"25 
4 

5 

12 
"25 



(a) 
(b) 

(d) 



1 01 
0 ij 



(e) 



1 1 
/2 {2 

L J_ 

/2 /2 

L 

~f2 

J_ 

' 

J_ 

1 1 

2 2 

1 _5 

2 6 

1 1 

2 6 

1 1 

2 6 



0 


1 


f2 


2 


1 






1 


1 






1 


1 ' 


2 


2 


1 


1 


6 


6 


1 


5 


6 


6 


5 
6 


1 

6 



a. (-1 + 3/3,3^/3) 



11. 



A = 



cos 9 0 —sin (9 

0 1 0 
sm6 0 cos 0 

10 0 
0 cos & sm0 
0 —sm0 cos 9 



13. 
17. 



The only possibilities are"^"*-*'^" rT'^~ /Tor"^""^'^" r-'^~" rr. 



21. 



a. Rotations about the origin, reflections about any line through the origin, and any combination of these 

b. Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 

c. No; dilations and contractions 

True/False 7.1 

(a) False 

(b) False 

(c) False 

(d) False 

(e) True 

(f) True 

(g) True 

(h) True 

Exercise Set 7.2 

^' a. \^ — ^X = 0: X=0: one-dimensional; A = 5: one-dimensional 

b. — 21 X — 54 = 0: A = 6: one-dimensional; A = — 3: two-dimensional 

c. — 3A^ = 0: A = 3: one-dimensional; A = 0; two-dimensional 

d. X^ — 12A^ H= 36A — 32 = 0; A = 2: two-dimensional; A = 8: one-dimensional 

e. X^ — 8A^ = 0: A = 0: three-dimensional; A = 8: one-dimensional 

f. A^ — 8A^ + 22A^ — 24A +9 = 0; A = 1 : two-dimensional; A = 3: two-dimensional 



3. 



P = 



P = 



P = 



P = 



15. No 
19. Yes 









2 








3" 


5 


0 1 


0 


f ° 


4 


5 


1 


1 




f2 


1 


1 




f2 


1 


0 






4 3 




5 5 




3 4 




5 5 




0 0 




0 0 





P~'AP-- 



P~'AP = 



3 0 
0 10 



25 0 
0 -3 

0 0 



0 
0 

-50 



1 

1 

'fe 

fe 



P-'AP = 



0 0 0 
0 3 0 
0 0 3 



0 0 
0 0 



; P~'AP = 



-25 


0 


0 


0 


0 


25 


0 


0 


0 


0 


-25 


0 


0 


0 


0 


25 



True/False 7.2 



(a) True 

(b) True 

(c) False 

(d) True 

(e) True 

(f) False 

(g) True 

Exercise Set 7.3 



1. 



[^1 ^2] 



[^1 ^2] 



[x\ ^2^3] 



3. 2x^ + 5y^ - 6xy 



3 

0 

4 -3 

-3 -9 

9 
3 



j_ j_ 

{2. {2 

_j_ j_ 

{2 {2 

2 2 

■3 3 

2 i 

3 3 

1 2 
3 3 

1 
2 



3 
-1 

1 

2 



-4 
1 

2 



Q = '^y\^y\ 



Q=y\^^yl y^yl 



0 0 
0 1 



+ [-1 6]L,U2 = 0 



+ [7-8] 



11. 



a. ellipse 

b. hyperbola 

c. parabola 

d. circle 

13. Hyperbola: 20/)^ -3(;?/)^ = 8; 0w - 26. 
15. Hyperbola: 4(;c/)2-0/)2 = 3; 0 = 36.9° 

a. Positive definite 

b. Negative definite 

c. Indefinite 

d. Positive semidefinite 

e. Negative semidefinite 
19. Positive definite 

21. Positive semidefinite 
23. Indefinite 
27. 't>2 
31. , 



1 



1 



n{n-\) 
1 



1 



n{n-\) 

1 



1 



n{n-\) 
1 

1 



n{n — V) n{n—X) 



b. Yes 



33. A must have a positive eigenvalue of multiplicity 2. 

True/False 7.3 

(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

(g) True 

(h) True 

(i) False 
(j) True 
(k) False 
(1) False 

Exercise Set 7.4 

1. Maximum: 5 at (1, 0) and ( — 1,0); minimum: _1 at (0, 1) and (0, — 1) 

3. Maximum: 7 at (0, 1) and (0, -1); minimum: 3 at (1, 0) and (-1, 0) 

5. Maximum: 9 at (1, 0, 0) and (-1, 0, 0); minimum: 3 at (0, 0, 1) and (0, 0, -1) 

7. Maximum: ^ = 4/2 at 7) = (2/2, 2 j and ( = 2/2, -2j, minimum: ^= - 4^2 at (;^, 7) =(- 2^2, 2) and (2/2, -2j 
9. 5x'->2 = 5 

/ \ 




13. Critical points: (-1, 1), relative maximum; (0, 0), saddle point 

15. Critical points: (0, 0), relative minimum; (2, 1) and (-2, 1), saddle points 

17 5 1 

Corner points: ~ y~ ^ 

21. =A 

True/False 7.4 

(a) False 

(b) True 

(c) True 

(d) False 

(e) True 

Exercise Set 7.5 



1. A* 



A = 



3. 



A = 



-2i 4 5-1 

\ \ I 3-1 0 

1 



-I -3 
2 + 3j 1 

a. ^13 '^^Sl 

b. ^22'^^ 



2-3i 
1 
2 



9. 



11. 



= A-^ = 



3 
5 



-I \ \[3 1 -i}[3 
2/2 2/2 

H-?/3 -I -{3 
2/2 2/2 



13. 



15. 



17. 



P = 



19. 



A = 



-1 4-i 


1 - j 






1 


2 






-1 -I 


1 1 1 


V ^ 


1/3 


2 


1 




V ^ 


U 


U 


2 


-1 + j 








2 


fs 




0 


I 2 



3 0 
0 6 



2 0 
0 8 



D = 



-2 0 0 
0 1 0 
0 0 5 



0 



1 



21. 



29. 
37. 



-2-3j -1 

a. '^IS'^ 

b. '^ll 

(c) and C must commute. 
1 z 



{2. {2 

_±_ L 

^2 /2 



39. Multiplication of x by P corresponds to ||u|| times the orthogonal projection of x onto W= span (u) . If ||u|| = 1, then multiplications of x by = / _ 2uu 
corresponds to reflection of x about the hyperplane u ' • 

True/False 7.5 

(a) False 

(b) False 

(c) True 

(d) False 

(e) False 

Chapter 7 Supplementary Exercises 



1. 



4' 


-1 


3 4" 


5 




5 5 


3 




4 3 


5 




5 5 



4 


0 


5 


9 


4 


25 


5 


12 


3 


25 


5 


1 


1 


~f2 


f2 


0 


0 


1 


1 


f2 





_3 
5 

12 
"25 

li 

25 



9 


12 


25 


25 


4 


3 


5 


5 


12 


16 


25 


25 



p= 



7. positive definite 

^' a. parabola 
b. parabola 

Exercise Set 8.1 

1. Nonlinear 
3. Linear 
5. Linear 



0 0 0 
0 2 0 
0 0 1 



a. Linear 

b. Nonlinear 

^,T{xx,X2) = {-Ax^ + '^X2, ;^i-3x2); 7(5, - 3) = ( - 35, 14) 

11. T{xi,X2,X2) = {-7:i^Ax2-X2„5xx-5x2-X2, + 3^3); 7(2,4, - 1) = (15, -9,-1) 
13. 7(2vi - 3v2 + 4v3) = ( - 10, -7, 6) 
15. (a) 
17. (a) 
19. (a) 

21. a. (1, -4) 

b. (1,0,0), (0,1,0), (|, -4, ij 

2 3 

c. x,x^,x^ 

-1" 

6 
4 

b. [-141 
19 
11 

c. Rank(7) = 2, nullity(7) = 1 

d. Rank(^) = 2, nullity(^) = 1 



0 
1 



-1 




-4 


-1 




2 


1 




0 


0 




7 



c. Rank(7) =nulHty(7) = 2 

d. Rank(^) = nulHty(^) = 2 

2'7' a. Kernel: j^-axis; range: xz-plane 

b. Kernel: x-axis; range: j^z-plane 

c. Kernel: the line through the origin perpendicular to the plane y = x; range: plane y = x 

29. a. Nullity(7) = 2 

b. Nullity(7) = 4 

c. Nullity(7) = 3 

d. Nullity(7) = 1 

31. a. 3 

b. No 

33. A line through the origin, a plane through the origin, the origin only, or all of ^ 
(b) No 

41. ker(Z)) consists of all constant polynomials. 

a. T{f{x))=f^'^{x) 

b. 7(/(x))=/^"+^)(x) 

True/False 8.1 

(a) True 

(b) False 

(c) True 

(d) False 

(e) True 

(f) True 

(g) False 

(h) False 

(i) False 

Exercise Set 8.2 




11. 



a. ker(T)= {0}; 7 is one-to-one 

^' ker(T) = |^|--|, ijj, 7 is not one-to-one 

c. ker(T) = {0}; 7 is one-to-one 

d. ker(T) = (0); 7 is one-to-one 

e. ker(T) = {^(1, 1)) ; 7 is not one-to-one 

f. ker(T) = {^(0, 1, - 1)) ; 7 is not one-to-one 

a. Not one-to-one 

b. Not one-to-one 

c. One-to-one 

a. ker(T)= {k(-\, 1)} 

b. 7 is not one-to-one since ker(7^ (0) . 

a. T is one-to-one 

b. 7 is not one-to-one 

c. 7 is not one-to-one 

d. 7 is one-to-one 



\a b 


c 


b d 


e 


c e 


/_ 


([a b' 




|c d 


)= 



a b 
c d 



T(ax^ -\- bx^ + cx) : 



T{a + b sm(x) H-c cos(x)) = 



13. 7 is not one-to-one since, for example, / {x) = x^ {x — \)^ is m its kernel. 
15. Yes; it is one-to-one 

17. 7 is not one-to-one since, for example a is in its kernel. 
19. Yes 

True/False 8.2 

(a) False 

(b) True 

(c) False 

(d) True 

(e) False 

(f) False 

Exercise Set 8.3 

1- a. (72 0 70(7:, 7) = (2^-37,27: I 3y) 

b. {T2o7x){x,y) = {Ax-\2y,3x-9y) 

c. {T2^T{){x,y) = {2x^3y, x-2y) 

d. {T2o7^){x,y) = {(^.2x) 

3. a. « + ^ 

b. (72 o T\){A) does not exist since T\ {A) is not a 2 x 2 matrix. 



5. 72(v) = lv 



11. 



a. 7 has no inverse. 



3^2 - ^^3 



3 5 1 



-XI --^2 +2^3 
4^^1+1x2 + ^X3 
ixi I ^^2-^X3 

—2x1 ~ I ^3 
—4^1 — 5x2 * 2^3 



13. 



15. 



a. i2j ^ 0 for j = 1, 2, 3 
b 



2 W) = - (^2 o Ti) (^(x)) = - 1) 



17. (a) (1,-1) 

(d) T~\2,3) = 2-\-x 

21. a. o 72 = ^2 o 7i 

b. T\o72^T2o T\ 

c. o ^2 = 72 o 7i 



True/False 8.3 

(a) True 

(b) False 

(c) False 

(d) True 

(e) False 

(f) True 

Exercise Set 8.4 

1. o To 0 0 

1 0 0 

0 1 0 

0 0 1 



1 -1 

0 1 

0 0 

0 0 

4 ^ 

3 3 



'1 1 r 

0 2 4 
0 0 4_ 

b. 3 + 10x + 16x2 



T(vi) = 



1 

-2 



[?'(v2)]b = 



3 
-5 



, n^2) = 


'-2 
. 29 


18 1 " 




7 7 




107 24 




7 7 





li 

7 
83 
■ 7 



11. 



[nv2)]B-- 



3 
0 
=2 



[nv3)]B-- 



b. 7(vi) = 16 5\x + 19x1 7'(V2) = - 6 - 5x + 5x^, 7(v3) = 7 + 40x 15;^^ 



22 + 56X I 14;^^ 



13. 



0 0 

6 0 
0 -9 
0 0 



0 0 0 

3 0 0 

0 3 0 

0 0 3 



[Ti]b\b-- 



b. [ ^2 o ^1 ] B',B = [ ^2 ] B'\B " [ ^1 ] £ '^B 



19. 



0 0 0 

0 0-1 

0 1 0 

0 0 0 

0 1 0 

0 0 2 

2 1 0 

0 2 2 

0 0 2 



d. 


'2 


1 


0" 


4" 




14" 


14^2^ _ Sxe^' - 20x^e^' since 


0 


2 


2 


6 




-8 




0 


0 


2 


-10 




-20 



21. 



a. -5'- -5/' 

b. B\ B"' 



True/False 8.4 

(a) False 

(b) False 

(c) True 

(d) False 

(e) True 

Exercise Set 8.5 
1. 

IT]b 



1 -2 
0 -1 



[T]Br- 



3_ 
'11 
_2_ 
"11 



56 
"11 
3_ 
11 



[T]b = 



[T]b-- 



[T]b- 



f2 -f2 
1 1 
/2 /2 

1 0 0 

0 1 0 

0 0 0 



[T]b,= 



13 
11/2 
5 



25 
11/2 
9 



11/2 11/2 



1 0 0 
0 1 1 
0 0 0 



1 1 

0 1 



11. 



-3-/2T 



-3 + /2T 

6 
1 



13. a. A=-4, A = 3 

Basis for eigenspace corresponding to A = 



- 2 I I ^ 



basis for eigenspace corresponding to A = 3: 5 — 2x -hx 



21. The choice of an appropriate basis can yield a better understanding of the linear operator. 
True/False 8.5 

(a) False 

(b) True 

(c) True 

(d) True 

(e) True 

(f) False 

(g) True 

(h) False 

Chapter 8 Supplementary Exercises 

1. No. T(xi -f X2) = ^(xi + X2) + 5 ^ (^1 -f 5) -f (^2 i ^) = ^(xi) I 7'(x2), and if c ^ ], then T(cx) =cAxi I B c (Ax B) = cT(x) . 

^' a. T(e2) and any two of 7(ei), T(e2), and T(e4) form bases for the range; (— 1,1,0, l)isa basis for the kernel. 
Rank = 3, nullity = 1 

7. a. Rank(T) = 2 and nulHty(T) = 2 

b. r is not one-to-one. 
11. Rank =3, nullity =1 
13. 



15. 



17. 



'l 0 


0 


o" 








0 0 


1 


0 








0 1 


0 


0 








0 0 


0 


1 












'-4 


0 


9' 






1 


0 


-2 






0 


1 


1 






"1 




-1 


r 


[T]b-- 




0 




1 


0 






1 




0 


-1 



19. (b) f(x)=x, g(x) = \ 
(c) f(x)=e\ g(x)=e- 



21. 
25. 



(d) The points are on the graph. 



0 0 0 

1 0 0 

0 1 0 

0 0 I 



0 0 0 



Exercise Set 9.1 

1. x\=2, X2= 1 
3. :tl = 3, 7:2 = -1 

5. ^1 = - 1» ^2=1, 7:3 = 0 

7. ^1 = - 1. X2=\, 7:3 = 0 

9. ^1 = — 3, 7:2 = 1, 7:3 = 2, 7:4 = 1 
11. , 



A = LU-- 



2 0 0 
-2 1 0 
2 0 1 



1 1 -1 

2 2 



0 0 
0 0 



A = LiDUi = 



1 


0 


0' 


"2 


0 


0' 


-1 


1 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 



1 

2 

0 0 
0 0 



c. 


1 


0 


0" 


'2 


1 


-1" 


A = L2V2 = 


-1 


1 


0 


0 


0 


1 




1 


0 


1 


0 


0 


1 



13. 



15. 



0 0 

1 0 
-2 1 



^1 ^2-- 



3 0 0 
0 2 0 
0 0 1 

11 
■ 17 



1 -4 2 

0 1 0 

0 0 1 

■ 17 



17. 



19. 





'1 


0 


0] 


A = 


0 


0 


1 




0 


1 


0 



(b) 



0 


0" 


1 


4 ^ 




2 


0 


0 


^ \ 


0 


1 










0 


0 1 


1 


0" 


a 


b 


c_ 


1 


0 


ad — be 


a 






a 



True/False 9.1 

(a) False 

(b) False 

(c) True 

(d) True 

(e) True 

Exercise Set 9.2 

^' a. A3 dominant 

b. No dominant eigenvalue 



3. r 0.98058 
^^^[-0.19612 





0.98837" 




0.98679" 




0.98715" 




_-0.15206_ 


; X3W 


^0.16201_ 




_-0.15977_ 



dominant eigenvalue: A = 2 + /lO^w 5. 16228; 



dominant eigenvector: 



10 



XI = 

X4 



-0.16228_ 
, aC1) = 6; x2 = [-J-^], A® = 6.6, X3 . 



-1 

1 

-0.53488 
1 



-0.53846j^ aC^«6.60550, 



A'^'^p^i 6.60555; 



dominant eigenvalue: A = 3 H= y/T3^Ri 6.60555; 

3 



dominant eigenvector: 



^26 I 4/T3 

2 t f[3 
/26 I 4/T3 



=0.47186 
0.88167 





1 




1 




XI = 


_-0.5_ 




_-0.8_ 


, X3?i; 



1 1 

-0.929J 



9. 
13. 



b. A'^^^ = 2.8, A^« 2.976, A'^pa 2.997 
^* Dominant eigenvalue: A = 3; dominant eigenvector: 
d. 0.1% 
2.99993; 



0.99180 
1.00000 



Starting with 



b. 



Starting with 



, it takes 8 iterations. 



, it takes 8 iterations. 



Exercise Set 9.3 



1. 


r 




'2 


ho = 


2 


, ao = 


0 




2 




3 



3. 


"0.39057" 




'0.60971' 




0.65094 




0 




0.65094 




0.79262 



5. Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 
7. Site 2, site 3, site 4; sites 1 and 5 are irrelevant 
Exercise Set 9.4 
1. a. pa 0.067 second 

b. pa 66.68 seconds 

c. Rj 66, 668 seconds, or about 18.5 hours 

3. a. PS 9.52 seconds 

b. « 0.0014 second 

c. PS 9.52 seconds 
PS 28.6 seconds 

^' a. 6.67 X 10"^ s for forward phase, 10 s for backward phase 

b. 1334 
7. flops 
9. 2«^-«^ flops 

Exercise Set 9.5 

1. 0, {I 
3. {I 



A = 



A = 



1 
1 

2 
1 



1 
1 

1 

■/^ 

_2_ 



{2 0 

0 /2 



1 0 
0 1 



8 0 
0 2 



1 2 

2 1 

7? 



A = 



6 



2 J_ 
3 

1 0 

3 3 

_2 J_ _i2 

3 6 



11. 



i4 = 



1 J_ 

/3 /2 

_J_ J_ 

"1/3 /2 



True/False 9.5 

(a) False 

(b) True 

(c) False 

(d) False 

(e) True 

(f) False 

(g) True 

Exercise Set 9.6 



J_ 

J_ 



3/2 0 
0 0 
0 0 



{3 0 
0 {2 



_j_ j_ 

/2 /2 
J_ J_ 
1/2 /2 



1 0 
0 1 



1 1 
{2 {2 



J_ J_ 

1 J_ 



3/2 



/5 



2 
3 
1 

3 

_2 

3 

1 
1 

J_ 



/3 0 ' 
0 /2 



1 1 
■f2 f2 



1 0 
0 1 



[1 0] I /2 



0 
1 

f2 

1 



[0 1] 



9. 70,100 numbers must be stored; A has 100,000 entries 

True/False 9.6 

(a) True 

(b) True 

(c) False 

Chapter 9 Supplementary Exercises 



1. 



2 0 
-2 1 

2 0 0 
1 2 0 
1 1 2 



I] 



-3 

0 2 

1 2 3 
0 1 2 
0 0 1 



A=3, v = 



1 

1 



X5 J« 



1 



0 



0 1 



0.7100 
0.7041 
1 

0.9918 

J_ 

f2 

0 



-J- 0 



2 0 
0 0 
0 0 



0.7071 
0.7071 



1 1 

1 1 
/2 {2 



11. 











1 


1 " 




















2 


2 












12 


0 


6 ■ 




1 


1 






'2 


1 


2 


4 


-8 


10 




2 


2 


"24 


0 ■ 


3 


3 


3 


4 


-8 


10 




1 


1 


_ 0 


12_ 


2 


2 


1 


12 


0 


6 




2 
1 
2 


2 
1 

2 






3 


3 


3 



Exercise Set 10.1 



a. y = 37: — 4 
h. y= - 2x + \ 



2. a. ^y'^-Ax -6y [ 4 = 0 or (x - 2)^ I (y - 3)^ = 9 
b. :t2^.;;2^2x-47-20 = 0or(x I 1)^ I Cy-2)^ = 25 

3. x'^ ^2xy ^ - 2x + y = 0 (a parabola) 

a. x + 2>'H-z = 0 

b. -x+7-2z4- 1 = 0 

x 7 z 0 

^1 yi ^1 ^ 

^2 ^2 ^ 

X3 73 Z3 1 

b. ;)r-|-27+z=0; =x+7-2z=0 

a. ;t2^_y2^z^-2x-4j-2z= -2or (j- 1)2 | (y-2)^ -\- (z- 1)^=4 

b. x^^y'^^z^-2x-2y = 3or{x-\)'^A^ {y-\)'^+z^ = 5 



10. 



7 ^ 



y\ ^1 ^1 1 

72 ^2 ^2 1 

11. The equation of the Hne through the three coUinear points 

12. 0 = 0 

13. The equation of the plane through the four coplanar points 
Exercise Set 10.2 

!• xi = 2, X2 = maximum value ofz= 

2. No feasible solutions 

3. Unbounded solution 

4. Invest $6000 in bond A and $4000 in bond B; the annual yield is 
5' ^ cup of milk, ounces of corn flakes; minimum cost = = 18.6& 

^' a. xi >0 and X2 > 0 are nonbinding; 2^:1 + 3x2 ^ binding 

b. X'l — 7:2 < V for V < _ 3 is binding and for v < _ 5 yields the empty set. 

c. X2<v for y .: 8 is nonbinding and for v - .. 0 yields the empty set. 

7. 550 containers from company A and 300 containers from company B\ maximum shipping charges = $2110 

8. 925 containers from company A and no containers from company B\ maximum shipping charges = $2312.50 

9. 0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum cost = 24. 8 & 
Exercise Set 10.3 

1. 700 



2. 



a. 5 

b. 4 

a- Ox, Iy units; sheep, |y unit 

b. First kind, measure; second kind, measure; third kind, measure 
25 25 25 



X\ : 



(fl2 + tat3 + ... + fl„) -a\ 
«-2 



--ai-x\, I = 2, 3, n 



b. Exercise 7(b); gold, 30-^ minae; brass, 9-^ minae; tin, 14-^ minae; iron, 5-^ minae 

a. 5x^y^z-K = 0 
x^ly^z-K = 0 
7:+74-8z-^ = 0 



X = "^Y". 7 = , z = , K = t where t is an arbitrary number 

b. Take ^ = 131, so that ;^ = 21, y = 14, ? = 12, ^ = 131- 

c. Take t = 262, so that ^ = 42,y = 2S,z = 24, K = 262- 



7 ? 
Legitimate son, 577-^ staters; illegitimate son, 422-^ staters 

b. Gold, 30-^ minae; brass, 9^ minae; tin, 14-^ minae; iron, 5^ minae 

c- First person, 45; second person, 37^; third person, 22-^ 



Exercise Set 10.4 

2- a. S(x)= - .12643(;c-.4)^- .202n(x-,4)^ 
b. 5r(.5) = .47943; error = 0% 

^' a. The cubic runout spline 

b. = 3x^ - 2x^ I 5x f 1 

.00000042(x+ 10)^ 



.92158(x- .4) I .38942 



.00000024^^ 



.0000126(;t)^ 



- .00000004(;r- 10)'' - . 0000054 (;r - 10)^ - 



.00000022(x-20)'' - .0000066(;t-20)^ - 
Maximum at (x, S(x)) = (3.93, 1.00004) 



.00000009(;f +10)"^ - .0000121(x I 10)^ 



Six) = l 



.00000009(x)'' 
.00000004(;^- 10)^ 



.0000093(;^)^ 
.0000066(^-10)^ 



.00000004(^-20)^ - .0000053(^-20)^ - 



.000214(;t4-10) 
.000088(x) 

.000092(^-10) 
.000212(^-20) 



.000282(^-1- 10) 
.000070(;r) 
.000087(;f-10) 
.000207(;^-20) 



-I- 
-I- 



4- 
+ 

I 



.99815, 
.99987, 
.99973, 
.99823, 



-10<;t<0 
0<x<\0 

10<x<20 
20<;^<30 



.99815, -10 <x<0 

.99987, 0<x<10 

.99973, 10<x<20 

.99823, 20<;f<30 



Maximum at (x, S(x)) = (4.00, 1.00001) 



S(x) = 



-4x^ -h 3x 



0<x<0.5 



4x^-\2x^-\-9x-\ 0.5<x<l 



2-2x 0.5<;t<l 
2-2x l<;t<1.5 



^(x) 

c. The three data points are collinear. 
(b) 



4 10 0 
14 10 
0 14 1 



0 0 0 0 
10 0 0 



(b) r 



2 10 0 
14 10 
0 14 1 



0 0 0 0 
0 0 0 0 



Exercise Set 10.5 

1. o 



0 0 0 1 

0 0 0 0 

0 0 0 0 

0 14 1 

0 0 14 



0 0 0 1 

0 0 0 0 

0 0 0 0 

0 0 4 1 

0 112 



Ml 




yn-\ 


2^1 






72 




M2 




y\ 








73 




M3 


6 


yi 


2J^^3 






74 
























7«-3 


- 27„_2 




7«-i 




Mn-\ 




yn-2 




-f 




71 




Ml 






hy\ - 




yi 


4 


72 


M2 






y\ - 




2y2 




73 


M3 


6 




72 - 




273 


+ 74 
























yyi-2 - 




«-i 


+ 7« 


Mn 






yy^-i - 




yn 


4 


^yn 



'A' 




'.46' 




'.454' 




.4546" 


.6 




_.54_ 




_.546_ 




.5454_ 



45454 
54546 



P is regular since all entries of P are positive; q = 



2. a. 


'.7' 




'.23' 




'.273" 




.2 


, = 


.52 


, x^ = 


.396 




.1 




.25 




.331 



P is regular, since all entries of P are positive: q = 



b. pYi 



, « = 1, 2, — Thus, no integer power of P has all positive entries. 



0 0 

1 1 



as n increases, so 



for any as n increases. 



The entries of the limiting vector 



are not all positive. 



P2 = 



7 10 
13 



1 


1 


1 


2 


4 


4 


1 


1 


1 


4 


2 


4 


1 


1 


1 


4 


4 


2 



has all positive entries; q - 



8. 54^:% in region 1, 16|% in region 2, and 29^% in region 3 
6 3 6 

Exercise Set 10.6 



1. 



0 0 0 1 
10 11 
110 1 

0 0 0 0 



0 110 0 
0 0 0 0 1 
10 0 10 
1 0 0 
1 0 0 



0 0 
0 0 



0 10 10 0 

1 0 0 0 0 0 
0 10 1 
0 0 0 0 0 
0 0 0 0 0 

1 



0 0 



0 1 



1 1 
1 
1 

0 




(a) 1 0 0 0 0 
0 10 0 0 
0 0 110 
0 0 12 1 
0 0 0 1 2 

(c) The i jth entry is the number of family members who influence both the ith and jth family members. 

5- a. {PhP2.P2) 

b. {P3.P4.P5) 

c. {P2.PA.P6.Pz) and (P^f 5,^6) 
^' a. None 

7. To 0 1 1] Power of Pi =5 
1 0 0 0 Powerof/'2 = 3 
0 10 1 Powerof/'3 = 4 
0 1 0 Oj Powerof/'4=2 

8. First, A; second, B and E (tie); fourth, C; fifth, D 
Exercise Set 10.7 

1- a. -5/8 
b. [0 1 0] 



c. [1 0 0 0]' 



Leti4 = 



1 1 
1 1 



for example. 



''•P*=[0 1], q* = 
P''=[0 1 0], q* 

p*=[0 0 1], q* 



v = 3 
, v = 2 

, v = 2 



p = [0 1 0 0], q = 



ill} 

ii I] 



, v= -2 



. 27 



70 
3 



P = 



[1 0], q = r I v = 3 



e. 

P =[T3 TsJ' 

5. 

P -[20 20j' ^ - 
Exercise Set 10.8 



ii 

5 



' "="20 



29 

■ 13 



1. 



a. Use Corollary 10.8.4; all row sums are less than one. 

b. Use Corollary 10.8.5; all column sums are less than one. 



c. 


'2' 




'1.9" 


Use Theorem 10.8.3, with x = 


1 


>Cx = 


.9 




1 




.9 



3. has all positive entries. 

4. Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 

5. $1256 for the CE, $1448 for the EE, $1556 for the ME 

6. (b) 542. 

503 

Exercise Set 10.9 

1. The second class; $15,000 

2. $223 

3. 1:1.90:3.02:4.24:5.00 

5. s/(gr^+g2"^+ • • ■ +gft"ii) 

6. 1:2:3: • ■ ■ :«-l 
Exercise Set 10.10 

1- c ro 1 1 0" 

0 0 11 

0 0 0 0 

off 0 



0 0 



1 1 

2 2 
0 0 0 0 



-2 -1 -1 -2 
-1-10 0 
3 3 3 3 



d. ^ 



0 .866 1.366 .500 
0 -.500 .366 .866 
0 0 0 0 



(b) 



(0,0,0), (1,0,0), (l^. and ^^,1,0 



(c) (0,0,0), (1,.6, 0), (1, 1.6,0), (0, 1,0) 



1 0 0 
0-10 
0 0 1 

-1 0 0 

0 1 0 
0 0 1 



1 0 0 
0 1 0 
0 0-1 



2 ^ ^ 
0 2 0 

0 0 i 



M2 = 



1 1 

2 2 
0 0 

0 0 



1 0 0 
0 COS 20 —sin 20 
0 sin 20 cos 20 



cos (-45 ) 0 sin (-45 ) 
M^= 0 1 0 

-sin ( -45') 0 cos (-45') 
b. Pf = M^M/^M2{M\P + M2) 



My. 



.3 0 0 
0 .5 0 
0 0 1 

cos 35 
0 



, M2-- 



0 sin 35 

1 0 

—sin 35 0 cos 35 



1 0 
0 cos 45' 
0 sin 45' 



0 

-sin 45' 

cos 45 



-1 0 

0 0 
0 1 



1 1 

0 0 
0 0 



cos (-45 ) -sin (-45 ) 0 
sin (-45') cos (-45') 0 
0 0 1 





"0 


0 • 


• 0" 




'2 


0 


0" 




0 


0 • 


• 0 


, M7 = 


0 


1 


0 




1 


1 • 


• 1 




0 


0 


1 



b. Pi = M7(MiM4(M2MiP + Mi) + Mg) 



^1 = 



^3 = 



COS ^ 0 im.^ 

0 1 0 

—sin /? 0 cos ,J 

cos^ 0 sin^ 

0 1 0 

— sin0 0 zosO 

cos ^ 0 —sin ^ 

0 1 0 

sin ^ 0 cos 



R2 = 



. R4 = 



COS a —sin a 0 

sin a cos Q 0 

0 0 1 

cos Ck sina 0 

—sin a cos Q: 0 

0 0 1 



M-- 



1 0 
0 1 



0 

0 70 



0 0 1 

0 0 0 



10 0-5 

0 10 9 
0 0 1-3 

0 0 0 1 



Exercise Set 10.11 
1. 







0 






1 






4 






1 


H 




4 






0 




' 1 " 






4 






3 






4 






1 






4 






3 






4 





1 1 

4 4 

0 0 

0 0 



1 1 0 
4 4 







" 3 " 




' 7 " 




"15" 




8 




16 




32 




64 




5 




11 




23 




47 




8 




16 




32 




64 




1 




3 




7 




15 




8 




16 




32 




64 




5 




11 




23 




47 




8 




16 




32 




64 





"64 
J_ 

"64 
J_ 

"64 
J_ 

"64 



d. forii andi3, -12.9%; fori2 andi4, 5.2% 



2. 1 

2 



152542543 
444444444 



t(2) 



13 18 



22 13 7 21 16 



16 16 16 16 16 16 16 16 
Exercise Set 10.12 



10 
16 



1. 



(c) 



=^3 



= /ll 27 A 
U2' 22 J 



a. Jl) . 
X3 ■ 

X3 - 

x^- 

X3 ■ 

X3 ■ 

X3 - 

X3 ■ 



(1.40000, 1.20000) 
(1.41000, 1.23000) 
(1.40900, 1.22700) 
(1.40910, 1.22730) 
(1.40909, 1.22727) 

(1.40909, 1.22727) 
b. Same as part (a) 
^- x^^'' = (9.55000, 25.65000) 
xf' = (.59500, - 1.21500) 
xf = (1.49050, 1.47150) 
xf = (1.40095, 1.20285) 
x^^ = (1.40991, 1.22972) 
xf = (1.40901, 1.22703) 



4. Xi=(l,l),X2=(2, 0),X3 = (1,1) 

7. X7+xs + X9 = \3.00 
X4 + X5^X6= 15.00 

x\ -\-X2-\-X2 = 8.00 
.82843(;r6 + ^s) + 58579x9 = 14.79 
1.41421(x3 4-X5-f;t7) = 14.31 
.82843(;^2 + ^4) + .58579:^1 = 3.81 
X2-^X6-^xg = 18.00 
X2'^X5=\-xs= 12.00 
x\ X4-\- xj = 6.00 
.82843(;t2 < ^6) + 58579x3 = 10-51 
1.41421(^1 4x54x9) = 16.13 
.82843(x4 4 xg) 4 .58579x7 = 7.04 

8. X7 4 xg 4 X9 

X4 + X5'^x^ 
XI +X2 + X3 

.04289(X3 4X5 4X7) 4.75000(x6 4xg) 4 .61396x9 
.91421 (X3 + X5 +X7) 4 .25000(x2 +X4 + X6 4 xg) 
. 04289(X3 4X5 4x7) 4 . 75000(X2 4x4) + .61396x1 

X2 + X5 4X8 
XI 4x44x7 

.04289(xi 4x5 4x9) 4 .75000(X2 4 xg) 4 .61396x3 
.91421(xi 4x54x9) 4 .25000(x24x44x64xg) 
.04289(xi 4 X5 4x9) 4 .75000(x4 4 xg) 4 .61396x7 = 

Exercise Set 10.13 
1. 

'1 0 



: 13.00 
: 15.00 

= 8.00 
: 14.79 
: 14.31 
= 3.81 
: 18.00 
= 12.00 
= 6.00 

: 10.51 
: 16.13 

7.04 



i = 1, 2, 3, 4, where the four values of 







"0" 
_0_ 




" 13 ' 




" 0 ■ 




are 




25 




13 






_ 0 _ 




25 _ 



and 



-(ft) 



s PS .47; ^h(^ ~ ^(^) f ^0 ^ -47) = 1.8 . ... Rotation angles: (upper left); _90° (upper right); igQ^ (lower left); jgO^ (lower right); 
(0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 



a. (i) s = ^; (ii) all rotation angles are o'; (iii) dni^ = ln(7) / ln(3) : 



1.771 . 



. This set is a fractal. 



(i)s = 



■^; (ii) all rotation angles are 180^ (iii) d^i^ = ln(3) / ln(2) = 1.584 . ... This set is a fractal. 



c. (i) s= (ii) rotation angles: _90 (top); IgQ (lower left); \^ 



d. (i) s = —; (ii) rotation angles: 90^ (upper left); 180° (upper right); 180° (lower right) (iii) d}{(^ 
= .S509...,e= - 2.69 ... 



(lower right); (iii) dni^ = ln(3) / ln(2) = 1 . 584 . . . . This set is a fractal. 

ln(3) /ln(2) = 1.584 . ...This set is a fractal. 



6. (0.766, 0.996) rounded to three decimal places 

7. t^//(^=ln(16)/ln(4) = 2 



b(4)/ 



:4.818.. 



9. d = ln(8) / ln(2) = 3; the cube is not a fractal. 

1^- k=20',s = \; dni^ = ln(20) / ln(3) = 2.726. . .; the set is a fractal. 



Second iterate 



Third iterate 
Fouilh iterate 



12. 



^ij/(^=b(2)/b(3)= 0.6309... 
Area of S'q = 1; area of S'l = -| = O.S 



I of S'2 = ^Ij^ = 0.790... ; area of S'3 = = 0.702... ; area of ^'4= 11^"^= 0.624... 



Exercise Set 10.14 

1. n(250) = 750, n(25) = 50, n(125) = 250, n(30) = 60, n(10) = 30, n(50) = 150, n(3750) = 7500, n(6) = 12, n(5) = 10 



2. 



One 1-cycle: { (0, 0) ) ; one 3-cycle: 



0), (|, I). (0, 1)}; .0 ...e. {(|, 0), (i I). (|. 0). (|, I)} a. {(0, |), (|, 1), (o, 1), (f |)}; 

{{^' 1} (f !)■ (!■ f )■ (f ■ i). (I- 1)- (f f )■ e- !)■ (!■ !)■ (!■ i)- (!■ f )■ (!• f )■ ~ 
i- ^} ih 1} [l 1} (f 1} (f 1} (!■ i). (f °} (f 1} (!■ I), (f !)■ (f ■ 1} (I ~ 



(2 5 

U' 6 



5 

6' 6j 
.n(6) = 12 



. and 



(a) 3,7, 10,2, 12, 14, 11, 10,6, 1,7,8,0,8, 8, 1,9, 10,4, 14,3,2,5,7, 12,4, 1,5,6, 11,2, 13,0, 13, 13, 11,9,5, 14,4,3,7,... 
(c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5),... 



(e) The first five iterates of o) are 

(b) 



21 
101 



and 



\m ' 101 / 



The matrices of Anosov automorphisms are 



3 2 
1 1 



and 



5 7 
2 3 



(c) The transformation affects a rotation of S through 90 in the clockwise direction. 



(0, I) 



to. 1/2) 



(0.0) 



In region I 




<»'»'2>P]^[l 2]^ * t 



lO.I)_(l/2, I) (1.1) 

in' 



(1.0) 



(0.0) (1/2,0) (1.0) 



m region J 



_? ;mregionIII:[^] = J_jj;inregionIV:p] = 



-1 

-2 



and j form one 2-cycle, and y j and ^ j form another 2-cycle. 



14. Begin with alQlxlOl array of white pixels and add the letter 'A' in black pixels to it. Apply the mapping to this image, which will scatter the black pixels 

throughout the image. Then superimpose the letter 'B' in black pixels onto this image. Apply the mapping again and then superimpose the letter 'C in black pixels 
onto the resulting image. Repeat this procedure with the letters 'D' and 'E'. The next application of the mapping will return you to the letter 'A' with the pixels for 
the letters 'B' through 'E' scattered in the background. 

Exercise Set 10.15 
1. 



a. 


GIYUOKEVBH 


b. 


SFANEFZWJH 


a. 




"12 7 




23 15 


b. 


Not invertible 


c. 




1 19 




23 24 


d. 


Not invertible 


e. 


Not invertible 


f. 




'15 12 






21 5 



3. WE LOVE MATH 
4. 



Deciphering matrix = 



7 15 
_6 5 

5. THEY SPLIT THE ATOM 

6. I HAVE COME TO BURY CAESAR 

^' a. 010110001 



; enciphering matrix = 



7 5 
2 15 



b. 



0 1 1 

1 1 1 
1 0 1 



8. A is invertible modulo 29 if and only if det(^) ^ 0 (mod 29). 
Exercise Set 10.16 



2. 



1 



M + 1 



(^ato-^o) 



«=1,2,. 



3 2 1 ^ 

^ 6(4) 



1 



6(4) 



— (2t:zo-^0-4^o) 



« = 0, 1,2, 



5 1 ^ 

«2« = ^ + -^^(2^0 - ^0 - 4co) 



1,2, 



^2« = ^ - -^^(2^0 - ^0 - 4co) 



^* Eigenvalues: \\ = 1, A2 = eigenvectors: ei ■ 







r 


_0_ 


- 62 = 


_-i_ 



5. 12 generations; .006% 



6. 



1 + 4 



1 



3 >i«+2 



1 

2 

0 
0 
0 
0 

1 

2 



^ + i7?r^C-347?)(l-/?)"^^] 



/t«+2 



10 0 0 

0 0 0 0 

0 0 0 0 

0 0 0 1 

Exercise Set 10.17 

a. - 



b. 3,(1) = 



"100" 




"175" 


, x® = 


"250" 




"382" 


_ 50_ 




. 50_ 




. S8_ 




_125_ 



857 
285 



855 
287 



7. 2.375 

8. 1.49611 

Exercise Set 10.18 



Yield = 33y% of population; xi : 



Yield = 45.8% of population; xi ■- 



; harvest 57.9% of youngest age class 



XI = 



1.000 




2.090 


.845 




.845 


.824 




.824 


.795 




.795 


.755 




.755 


.699 




.699 


.626 


.626 


.532 




.532 


0 




.418 


0 




0 


0 




0 


0 




0 



1.090 t .418 .QQ 
7.584 



5. _ ^1+^2^1-^ ' 



I aj^ibib2 ' • -bj-2 



ajbib2 ■ • 'bj-i + 
Exercise Set 10.19 

^' ^ + 4 cos ^ + cos 2t I ^cos 3t 

2- , X,,3^.4^cosf . I ^cos ^ 

3- i + -i sin^ - ^ cos 2^ - ^r|- cos 4^ 
5. r 8T f 1 



cos 3^ — 



1 



. cos^ + ^cos4^4^cos-^ + 



(2«-l)(2«-f- 1) 
lOir^ _^ 1 



cos ntj 
2n-t 



{2n) 



2 7 



Exercise Set 10.20 

^' a. Yes; v = jvi + |v2 -f |v3 

b. No; V = "Ivi + yV2 - y V3 

c. Yes; v = "Ivi + jV2 I OV3 

d. Yes;v = ^vi I ^V2 \ 

1. m= number of triangles =1 ,n— number of vertex points = ^,k= number of boundary vertex points = 5; Equation (7) is 7 = 2(7) — 2 — 5. 

3. w= My b = M(civi + C2V2 + c^^i) + {c\ •¥c2-¥ c^)^ 

= c\{My\ +b) +C2('^V2 + b) +C3(Mv3 + b) =c:iwi + C2W2 + C3W3 

4. a ^2 




M = 



M = 



M-- 



2 0 





r 


, b = 


2 







a. Two of the coefficients are zero. 

b. At least one of the coefficients is zero. 

c. None of the coefficients are zero. 
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